research.altifigence.com > notebooks > Gemma 3N E4B
Latency Benchmark Methodology
Benchmark methodology notebook with decode latency, component timing, thread scaling, and CSV export paths.
Experiment data
1,218.0 msAverage decode mean across experiment runs
26.2 ppFFN share reduction from short to long context
28.0 ppKV-cache attention share increase
Decode latency breakdown
Per-run decode component timing from experiment_results.csv.
FFNQKVOutput projectionAttentionRuntime overhead
FFN versus KV-cache bottleneck shift
Component share from context_sweep_results.csv as context length grows from short to long prompts.
FFN shareKV-cache attention share
CPU-thread scaling
Decode throughput and FFN timing from thread_scaling_results.csv.
Decode tok/sFFN ms trend
GEMV, GEMM, and memory-bandwidth relationship
Normalized curves from bottleneck_analysis.csv compare FFN weight-read pressure, KV-cache attention reads, and decode latency.
FFN GEMV read/tok, max 1,680.0 MBKV-cache read/tok, max 44.98 MBDecode latency, max 1,372.1 ms