Category: Uncategorized

  • R1CS vs Plonkish vs AIR

    Fibonacci in R1CS, Plonkish, and AIR side by side, in around 1200 lines of plain Rust with no proving libraries and no jargon.

    Listen to this post (~12 min)

    You read the Groth16 paper and learn that a ZK circuit is a system of matrices called R1CS, then you open the halo2 docs and the same circuit is a table of cells with custom gates, and in a STARK paper it’s an execution trace with transition constraints. All three are doing the same job.

    That job is called arithmetization: you take a computation and rewrite it as polynomial equations, because ZK provers do not run code, they evaluate polynomials. R1CS, Plonkish, and AIR are three different ways to write those equations, and this post writes Fibonacci in all three, side by side, in around 1200 lines of plain Rust with no proving libraries and no external dependencies. The repo is at github.com/adamsmo/r1cs-plonkish-air.

    Fibonacci because it is boring, and a boring computation makes the differences obvious instead of hiding them behind complicated logic. The field everywhere is $p = 2^{31} – 1$, the Mersenne-31 prime used by Plonky3 and primary in Stwo, while SP1 ran on BabyBear up to V5 and switched to KoalaBear in V6.


    R1CS

    R1CS is the oldest of the three. Three matrices $A, B, C$, a witness vector $z$ with $z_0 = 1$, one equation:

    $$(Az) \circ (Bz) = (Cz)$$

    The $\circ$ is element-wise multiplication, also called Hadamard. Each row is two linear forms multiplied: $\langle a_i, z\rangle \cdot \langle b_i, z\rangle = \langle c_i, z\rangle$. “Rank-1” is a fancy way of saying every constraint is one multiplication.

    Addition is free, you fold it into the coefficients of one of the linear forms, but every multiplication costs you a row, which is what makes Fibonacci the worst case for R1CS: the recurrence $f_{i+2} = f_i + f_{i+1}$ has no multiplication anywhere, so you have to fake one by multiplying the sum by 1.

    $$(f_i + f_{i+1}) \cdot 1 = f_{i+2}$$

    Witness vector:

    z = [1, f_0, f_1, f_2, ..., f_n]

    Row $i$ of $A$ picks $f_i$ and $f_{i+1}$, row of $B$ picks the constant, row of $C$ picks $f_{i+2}$. Two nonzero entries in $A$, one in $B$, one in $C$:

    for i in 0..num_constraints {
        r1cs.a.set(i, 1 + i, F::ONE);  // f_i
        r1cs.a.set(i, 2 + i, F::ONE);  // f_{i+1}
        r1cs.b.set(i, 0, F::ONE);      // constant 1
        r1cs.c.set(i, 3 + i, F::ONE);  // f_{i+2}
    }

    For n = 10 the matrices, dots are zeros:

    A =                                 B =
    [ . 1 1 . . . . . . . . . ]         [ 1 . . . . . . . . . . . ]
    [ . . 1 1 . . . . . . . . ]         [ 1 . . . . . . . . . . . ]
    [ . . . 1 1 . . . . . . . ]         [ 1 . . . . . . . . . . . ]
    [ . . . . 1 1 . . . . . . ]         [ 1 . . . . . . . . . . . ]
    [ . . . . . 1 1 . . . . . ]         [ 1 . . . . . . . . . . . ]
    ...                                 ...
    
    C =
    [ . . . 1 . . . . . . . . ]
    [ . . . . 1 . . . . . . . ]
    [ . . . . . 1 . . . . . . ]
    ...

    The entries are coefficients, not values. Every one is 1 because Fibonacci is pure addition with no scaling, and if the recurrence were something like $2f_i + 3f_{i+1}$ you would see 2 and 3 in $A$ instead. The Fibonacci numbers themselves live in the witness $z = [1, 0, 1, 1, 2, 3, 5, \dots]$, and the matrices only pick which entries to add: row 0 of $A$ has 1s in columns 1 and 2, so $Az$ pulls out $z_1 + z_2 = f_0 + f_1$.

    Demo prints every constraint evaluated:

    [f_0 + f_1 = f_2]  (1·z[1] + 1·z[2]) · (1·z[0]) = (1·z[3])   =>   1 · 1 = 1
    [f_1 + f_2 = f_3]  (1·z[2] + 1·z[3]) · (1·z[0]) = (1·z[4])   =>   2 · 1 = 2
    ...
    [f_8 + f_9 = f_10] (1·z[9] + 1·z[10]) · (1·z[0]) = (1·z[11]) =>   55 · 1 = 55

    To verify, you do three sparse matrix-vector products and one Hadamard product, and that is the whole check: the format is rigid but the verifier is dead simple.

    R1CS lives under Groth16, Marlin, Spartan, and Nova, where Nova uses a relaxed variant tuned for folding. In Rust the big ecosystem is arkworks, with bellman and bellpepper as the other major line, the one that Filecoin, Lurk, and Spartan2 sit on.


    Plonkish

    R1CS forces every row to be a degree-2 bilinear product, and every row carries a constraint. PLONK relaxed both of those, and the systems that followed got called “Plonkish”.

    You write your circuit as a table, where each column has a type:

    • advice holds your secret witness (you commit to it)
    • fixed holds setup data
    • selector is a 0/1 flag per row, says “this constraint fires here”

    A constraint is a polynomial over those cells, and it can read not just the current row but also offsets above and below, where the offset is called a rotation. You multiply the whole constraint by a selector polynomial, and it fires only on rows where the selector is one.

    On top of that you get two more tools: copy constraints, which are a permutation argument internally and let you say “this cell equals that cell” without writing the relationship into the witness layout, and lookups, which force a value to come from a precomputed table.

    For Fibonacci you allocate one advice column called a, one selector s, and one fixed column to hold the boundary values 0 and 1, and you define a single gate:

    $$s(\text{row}) \cdot \big(a(\text{row}) + a(\text{row}+1) – a(\text{row}+2)\big) = 0$$

    That is three rotations of the same column wrapped in one selector, all in a single polynomial, with two copy constraints pinning $a(0) = \text{boundary}(0) = 0$ and $a(1) = \text{boundary}(1) = 1$.

    let gate_expr = query(s, Rotation::CUR)
        * (query(a, Rotation::CUR)
           + query(a, Rotation::NEXT)
           - query(a, Rotation(2)));
    cs.add_gate("fibonacci_step", gate_expr);
    
    cs.copy(CellRef { column: a, row: 0 }, CellRef { column: boundary, row: 0 });
    cs.copy(CellRef { column: a, row: 1 }, CellRef { column: boundary, row: 1 });

    Resulting table for n = 10:

     row | advice a | selector s | boundary
    -----+----------+------------+---------
       0 |     0    |     1      |    0
       1 |     1    |     1      |    1
       2 |     1    |     1      |    0
       3 |     2    |     1      |    0
       4 |     3    |     1      |    0
       5 |     5    |     1      |    0
       6 |     8    |     1      |    0
       7 |    13    |     1      |    0
       8 |    21    |     1      |    0
       9 |    34    |     0      |    0    (gate off: rotation +2 has no next-next)
      10 |    55    |     0      |    0    (gate off: end of trace)

    The gate fires wherever the selector is one, and the last two rows turn it off because the rotation $+2$ would read past the end of the table.

    The cost of all this flexibility is design effort: you pick the column layout, the custom gates, and which rotations make sense, and you watch the gate degree because bumping it from 2 to 3 roughly doubles prover cost. Column width, gate degree, and lookup usage all pull against each other, and balancing them is most of the work in writing a Plonkish circuit. What you get in return is compact circuits for common operations, plus wiring through copy constraints that is almost free in your circuit’s constraint count (not free in permutation argument cost, but free in the gate you write).

    Plonkish is what halo2 runs on, along with its production fork axiom-crypto/halo2-lib, Plonky2, UltraPlonk, and many in-house variants.


    AIR

    Plonkish is still a two-dimensional table, and AIR is the same table viewed as a state machine that ticks row by row, where rows are time steps and columns are registers. A transition constraint is a polynomial over a fixed window of row offsets that has to evaluate to zero on every window, and a boundary constraint pins a single cell to a fixed value.

    The window can be arbitrary in theory, but every real implementation (Plonky3, Winterfell, RISC Zero, Cairo, SP1) uses two consecutive rows for the standard API, for performance and proof size reasons. RAP (Randomized AIR with Preprocessing) adds randomized auxiliary columns tied to verifier challenges, LogUp adds lookups on top, and everything else stacks on the same core of trace plus transitions plus boundaries.

    Fibonacci in AIR uses two columns: column a at row $r$ holds $f_r$, column b holds $f_{r+1}$, and you write two transition constraints between row $i$ and row $i+1$:

    $$\text{shift: } a(\text{next}) – b(\text{curr}) = 0$$

    $$\text{sum: } b(\text{next}) – \big(a(\text{curr}) + b(\text{curr})\big) = 0$$

    Two boundaries: $a(0) = 0$, $b(0) = 1$.

    let shift = next(0) - curr(1);
    let sum   = next(1) - curr(0) - curr(1);

    Trace:

     row |   a (f_i) |  b (f_{i+1})
    -----+-----------+--------------
       0 |     0     |       1
       1 |     1     |       1
       2 |     1     |       2
       3 |     2     |       3
       4 |     3     |       5
       5 |     5     |       8
       6 |     8     |      13
       7 |    13     |      21
       8 |    21     |      34
       9 |    34     |      55

    Both transitions are linear, degree 1, because Fibonacci is a linear recurrence with no multiplication anywhere in the body. This is where the three differ: AIR stays at degree 1, but R1CS and Plonkish both reach degree 2, R1CS because it fakes a multiplication by 1 to write an addition, and Plonkish because the selector multiplies that addition.

    The cost of AIR is having to think in state machines, which fits anything that already looks like one (RISC Zero proves full RISC-V execution as a Randomized AIR) and is awkward for anything that does not tick row by row. Transition constraints see only two rows at a time, so anything spanning a larger window needs extra columns, periodic columns, or auxiliary state encodings.

    STARK-style AIR provers skip the trusted setup, which is the usual reason to pick them, but STARK proofs are typically much bigger than KZG-based PLONK proofs and the verifier is slower, so it depends on where you deploy and what you can afford there.

    AIR powers Cairo (StarkWare), Winterfell, Plonky3, RISC Zero, and SP1 pre-V6 (the version called Turbo). SP1 V6 (codenamed Hypercube) has been on mainnet since 19.02.2026 and swapped the FRI/STARK back end for a Jagged PCS multilinear IOP, but the per-chip constraint layer is still AIR. The paper is Hemo et al., ePrint 2025/917.


    Side by side

    For Fibonacci with n = 10:

    R1CSPlonkishAIR
    data shapevector $z$cell tabletrace (2 columns)
    size9 rows × 12 cols11 rows × 3 cols10 rows × 2 cols
    constraints9 constraints1 gate2 transitions
    degree (this circuit)2 (bilinear)2 (selector · gate)1 (linear)
    extrasnone2 copy constraints2 boundaries
    form$(Az) \circ (Bz) = Cz$$\sum g_i(x) \cdot s_i(x) = 0$$C(T(x), T(\omega x)) = 0$
    used byGroth16, Marlinhalo2, Plonky2Winterfell, Plonky3, Stwo

    Degrees in the table are for these Fibonacci circuits, not properties of the systems. R1CS is always rank-1 by definition, Plonkish allows higher degrees (zcash/halo2 computes the degree dynamically, axiom-crypto/halo2 defaults to MAX_DEGREE=5 via an env var), and AIR allows arbitrary polynomial degree, with real STARK constraints usually at degree 2-3 or more. Nova uses relaxed R1CS, a variant tuned for folding.

    Same answer in all three: $f_{10} = 55$.

    Run cargo run --bin compare in the repo to get the table with live values.


    Can you unify the three

    Kind of: a 2023 paper called CCS, short for Customizable Constraint Systems (ePrint 2023/552), generalizes all three by letting you specify a set of matrices and a multilinear combination over them, so R1CS, Plonkish, and AIR each fall out as special cases. Nova’s follow-up, HyperNova, uses CCS as the underlying representation.

    In practice though, you pick a stack and the stack comes with one of the three shapes built in, so CCS is more of an academic idea than something you use day to day.


    Which one do you pick

    Usually the stack picks for you, and the question is just what you are building. For a one-off SNARK over a small relation where pairing-based crypto and a trusted setup are fine, R1CS via arkworks plus Groth16 is what you get for free. For a production circuit like a bridge, a rollup, or a custom EVM precompile, Plonkish via halo2-lib is the default, because the ergonomics, the ecosystem, and the audit history are all there.

    Anything that looks like a CPU executing instructions belongs in AIR, where Winterfell is the one to learn on and Plonky3 the one for when you need performance, or you can skip writing circuits entirely and grab a zkVM like RISC Zero, SP1, or Valida. And if you are learning, do all three: the repo writes Fibonacci in each, and going through the three implementations yourself shows the differences faster than reading papers about them does.


    The repo

    github.com/adamsmo/r1cs-plonkish-air

    • src/field.rs: prime field over $p = 2^{31} – 1$, 140 lines
    • src/r1cs.rs: sparse matrices, constraint check, Fibonacci builder
    • src/plonkish.rs: cell table, custom gates with rotations, copy constraints, Fibonacci builder
    • src/air.rs: trace, transition constraints, boundary constraints, Fibonacci builder
    • src/bin/*.rs: four demos, fib_r1cs, fib_plonkish, fib_air, compare

    cargo test passes 13 tests with zero external dependencies and standard library only.

    What’s not in there: commitments, polynomial interpolation, FRI, KZG, Fiat-Shamir, or any actual proving. This is the arithmetization layer only, and proving sits on top of it.


    What next

    The next post goes back to step 5 of the road map, applying constraints to polynomials, and shows what happens when you feed one of these arithmetizations into FFT plus a commitment scheme: the trace becomes a polynomial that collapses to a single field element the verifier checks.

    Until then, the repo is the fastest way to build intuition: pick the arithmetization that confuses you most, break the witness somewhere, and watch the fib_*_rejects_bad_witness soundness tests catch you. If something looks off or you find a bug, ping me on Discord.


    Further reading

    R1CS

    • Gennaro, Gentry, Parno, Raykova “Quadratic Span Programs and Succinct NIZKs without PCPs” EUROCRYPT 2013 (ePrint 2012/215): the QAP construction that R1CS reformulates.
    • Ben-Sasson, Chiesa, Genkin, Tromer, Virza “SNARKs for C: Verifying Program Executions Succinctly and in Zero Knowledge” CRYPTO 2013 (ePrint 2013/507): introduces the R1CS format itself.
    • Groth “On the Size of Pairing-based Non-interactive Arguments” EUROCRYPT 2016 (ePrint 2016/260): Groth16 over R1CS.

    Plonkish

    • Gabizon, Williamson, Ciobotaru “PLONK: Permutations over Lagrange-bases for Oecumenical Noninteractive arguments of Knowledge” 2019 (ePrint 2019/953).
    • Gabizon, Williamson “plookup: A simplified polynomial protocol for lookup tables” 2020 (ePrint 2020/315).

    AIR

    • Ben-Sasson, Bentov, Horesh, Riabzev “Scalable, transparent, and post-quantum secure computational integrity” 2018 (ePrint 2018/046): original STARK paper.

    Want a structured path to ZK?

    1:1 mentoring for Rust developers learning zkSnarks the right way, with working code, expert feedback, and clear progression.

    See the 1:1 Mentoring
  • Execution Traces or how to package some data

    Welcome, fellow zk-enjoyer.

    Listen to this post (~7 min)

    If you’ve been following along, you know how polynomials work as a data structure and why roots of unity make evaluating them fast. Now the question: before the polynomial checks anything, where does the data come from?

    This is post #3 in the “first principles” ZK series:

    1. Polynomials and finite fields
    2. Roots of unity
    3. Execution traces ← you are here
    4. FFT/NTT interpolation
    5. Constraints
    6. Polynomial packing
    7. KZG commitment
    8. Proof and verification

    You already know how to do this

    Here’s a function you’ve written a hundred times:

    fn is_valid_age(x: u32) -> bool {
        x >= 18 && x <= 90
    }

    The CPU runs it, checks the condition, returns true or false. The result is immediate. You trust the machine.

    This isn’t hypothetical. Poland’s digital ID (e-dowód) on mobile phones could generate exactly this kind of proof: a hotel checks you’re 18+ from a QR code without seeing your birth date. The circuit behind it is a range check like the one we’re building here. (More on real-world ZK use cases in the Kraków meetup Q&A.)

    Now imagine you need to prove to someone that this returned true for a specific value of x, without revealing x. The verifier can’t run your code on your private inputs. They get a proof, a small piece of data, and they verify it mathematically.

    To produce that proof, you need an execution trace.


    What is an execution trace?

    An execution trace is a table. Each row captures a value that was relevant to your computation. You fill it in while running the program, before any proving happens.

    For is_valid_age(45), the two conditions x >= 18 and x <= 90 get rewritten as subtractions: x - 18 (should be non-negative if x >= 18) and 90 - x (should be non-negative if x <= 90). This rewriting is part of translating code into polynomials. It’s usually the hardest part of writing circuits, because the equations have to catch every corner case and enforce the same conditions as the original code. More on how constraints work in later posts. For now, these are the values we track:

    stepvalue
    x45
    x – 1827
    90 – x45

    That’s it. You’re writing down every intermediate value so a polynomial can encode them later. The polynomial doesn’t execute your logic. It encodes this table, and constraints check that the numbers are consistent.

    OK, so we rewrote the comparison from the if as subtractions. But we still need to check that the result is non-negative, which looks like the same kind of check we started with. It is, almost, but not quite.


    Why not just compare directly?

    This is where your intuition will fight you. In finite fields, there are no negative numbers, everything wraps around. If this section feels weird, that’s normal, read it slowly.

    Think of a car odometer that only goes to 999,999. Drive one more mile past that and it shows 000,000, not 1,000,000. Try to go backwards from 000,000 and you get 999,999, not -1.

    ZK proofs operate over finite fields, which work the same way but with a much bigger odometer. Every number is in the range [0, p-1] and arithmetic wraps around. There is no concept of “greater than” or “less than” in the strict sense.

    Arithmetic in a finite field: numbers wrap around at the modulus. There is no negative, no overflow, just a circle.

    This means x - 18 >= 0 is meaningless. If x = 5, then x - 18 doesn’t equal -13. It equals p - 13, which is a large positive-looking number:

    Click Run. Both results look nonnegative, the field can’t tell them apart by sign.

    So at this point it looks like rewriting comparisons as subtractions and checking for non-negative results was completely pointless. In a finite field every result looks non-negative. Bear with me, we’re getting there. (If you want more on why circuits only speak addition and multiplication, see Q4 in the meetup Q&A.)

    We need to go deeper

    Do you even range-check?

    You use bit decomposition.

    Wait what???

    Yes, we’re counting bits. Like it’s 1975. Welcome to cryptography.

    You take the result of the subtraction and break it down into bits. If it fits in k bits, you know it’s a small number. If it doesn’t fit, it’s huge (because the finite field wrapped it around, and those wrapped numbers are huuuge), and the check fails. That’s it. 8 bits can hold 0 to 255, so if x – 18 fits in 8 bits, x is somewhere between 18 and 273.

    One check isn’t enough. You need two: one for the lower bound (x - 18 fits in k bits, so x >= 18) and one for the upper bound (90 - x fits in k bits, so x <= 90). Both together prove x is between 18 and 90. That’s why the trace ends up with two bit decompositions, one per bound.

    The gap between 18 and 90 is 72, so 8 bits is more than enough (8 bits can hold up to 255). But each check alone has a range that is too wide:

    -165 18 90 273 col_lower 18 273 col_upper -165 90 BOTH 18 90 ← only this Shown on real numbers for clarity. In a finite field the wrapped values are huge, but the intersection is the same.
    • col_lower alone accepts x from 18 up to 273 (because x – 18 just needs to fit in 8 bits)
    • col_upper alone accepts x from -165 up to 90 (because 90 – x just needs to fit in 8 bits)
    • both together accept only x from 18 to 90

    Try x = 91 (one above the upper bound). col_lower: 91 – 18 = 73, fits in 8 bits, passes. col_upper: 90 – 91 = p – 1, a ~254-bit number, does not fit in 8 bits, fails. The lower check alone would have let x = 91 through. You need both.

    Try x = 17 (one below the lower bound). col_upper: 90 – 17 = 73, fits, passes. col_lower: 17 – 18 = p – 1, does not fit in 8 bits, fails.

    Click Run. For x = 91, the lower check gives 73 (small, fits in 8 bits), but the upper check wraps around to a ~254-bit number. For x = 17, it’s the other way around. Those giant numbers will never fit in 8 bits.

    Bit decomposition of n=27: each bit multiplied by its weight sums to 27. For n=256, even all bits set to 1 only reach 255.

    Here’s why that’s a hard constraint, not just a flag. To prove a k-bit decomposition, the prover has to provide k individual bits, each either 0 or 1, that add up to the original value when weighted by powers of two. For 27 that looks like:

    1·1 + 1·2 + 0·4 + 1·8 + 1·16 + 0·32 + 0·64 + 0·128 = 27

    Wait, wait, stop. Do you see what this is? Do you see it?

    THIS IS A POLYNOMIAL

    The most you can reach with 8 bits is 1+2+4+8+16+32+64+128 = 255. If the value is 256 or more, no combination of 0s and 1s adds up to it. So we effectively have a non-negative check that works over a finite field, in the form of a polynomial: if the number is genuinely small (a valid subtraction result), it can be represented as the polynomial above. If the field wrapped it around into a giant number, it can’t.

    There’s one more thing needed: a constraint that forces each bit to actually be 0 or 1, not some other field element. More on that in the constraints post.


    The full trace

    Now you can see why the trace looks the way it does. For is_valid_age(45), the trace has three columns. col_x holds the secret value. col_lower and col_upper hold the subtraction results and their bit decompositions. Without col_x in the trace, there would be no way to prove that col_lower[0] actually equals x - 18 and not some arbitrary number.

    rowcol_xcol_lower (x-18)col_upper (90-x)
    0452745← values
    1011← b0
    2010← b1
    3001← b2
    4011← b3
    5010← b4
    6001← b5
    7000← b6
    8000← b7

    Row 0 holds the subtraction results. Rows 1 through 8 hold the individual bits. That’s the whole structure.

    You didn’t write if. You wrote down what had to be true for the if to pass, for specific case of secret value x.


    In arkworks: the trace as field elements

    Throughout this series we build PLONK by hand using ark-ff, just field arithmetic. Each cell in the trace is a field element.

    The snippet below computes the subtraction results and breaks them into bits, the same bit decomposition from the previous section, but now in arkworks code. The key line is (n >> i) & 1 inside to_bits_le: it extracts bit i from the number. Everything else is plumbing.

    Now we assemble the trace. Each column is a vector: the value in position 0, followed by 8 bits. That’s the whole trace.

    Click Run. The output matches the trace table: row 0 holds subtraction related values, rows 1-8 hold bit decompositions.


    What the trace is for

    “OK I built this trace, but what’s it for?”

    A common confusion: people think proving IS executing. It’s not. Three separate steps, three different roles:

    Step Who What Knows everything?
    1. Execute Prover Runs program, fills the trace Yes, all inputs/outputs/steps
    2. Prove Prover Turns completed trace into a proof Yes, but hides the secrets
    3. Verify Verifier Checks the proof No, sees only proof + public inputs

    The constraints don’t walk through the trace row by row, discovering what happens next. The full trace is already there. They just check that all the numbers are consistent.

    Pepe Silvia meme: I didn't run your code, I just checked that these numbers are consistent
    I didn’t run your code. I just checked that these numbers are consistent.

    One thing to take away from this post: the prover doesn’t prove by re-running your code. The prover fills in a table, and the math checks that the table is consistent. Execution first, proof second. Always.

    Next up

    Post #4, FFT/NTT interpolation: how to turn this table into polynomials. And do some cool stuff with it.

    Coming soon.

    The FFT/NTT algorithm is covered in Roots of Unity.


    Playground

    This editor is interactive. Copy code from above, modify it, break it, experiment. Try changing x to a value outside [18, 90] and see how the trace changes.


    If you have questions or want to discuss this post, join the ZK for Rust Devs Discord.


    This is post #3 in the “first principles” ZK series on rustarians.com.

    Halo2 Cheat Sheet

    API patterns, security pitfalls, fork comparison. One-page PDF reference for halo2 development.

    Get the Cheat Sheet

    Want a structured path to ZK?

    1:1 mentoring for Rust developers learning zkSnarks the right way, with working code, expert feedback, and clear progression.

    See the 1:1 Mentoring
  • 10 Questions from a Rust ZK Meetup in Kraków

    (more…)
  • Roots of Unity

    Welcome back, fellow zk-enjoyer.

    Listen to this post (~10 min)

    In the last post, we used polynomials as a data structure and verified them with a single number (Schwartz-Zippel, remember?). But there’s a question that should be bugging you: “how do we do this efficiently?”

    The answer is called “roots of unity.” I promise the idea is simpler than the name suggests, and it’s what makes ZK proofs better (about 50,000 times better, for a typical ZK circuit, terms and conditions may apply [TM]).


    The Road Map

    1. Polynomials over finite fields
    2. Roots of unity ← you are here
    3. Execution trace
    4. Polynomial interpolation via FFT/NTT
    5. Applying constraints to polynomials
    6. Packing into a single polynomial
    7. Commitment scheme (starting with KZG)
    8. Proof AND Verification

    Why Are There Two Ways to Store a Polynomial?

    Before we get to roots of unity, you need to see the problem they solve.

    You can represent a polynomial in two different ways.

    Coefficient form is what you already know: store the coefficients as a vector. $P(x) = 1 + 2x + 3x^2$ becomes [1, 2, 3]. Adding two polynomials is cheap: add the vectors element by element, $O(N)$. But multiplying them? Every coefficient of P has to multiply with every coefficient of Q. That’s $O(N^2)$.

    Evaluation form stores what the polynomial outputs at N chosen points. Instead of storing the recipe (coefficients), you store the dishes already cooked: $[P(x_0), P(x_1), \ldots, P(x_{n-1})]$. Think of it as a lookup table. You don’t know the formula, but you know what it produces at each point. Now multiplying two polynomials is trivial: just multiply the values at each point, $O(N)$.

    Think of it like a coin. Coefficient form and evaluation form are two views of the same polynomial. Each view makes a different operation cheap: coefficients for addition, evaluations for multiplication. The catch is flipping the coin.

    Flipping from coefficients to points is evaluation: plug N points into the polynomial, $O(N^2)$. Flipping back is interpolation: find the polynomial that passes through N given points, also $O(N^2)$.

    For our 8-coefficient polynomial, that’s $8^2 = 64$ multiplications per flip. Fine. But real ZK circuits use $N = 2^{20}$ coefficients (about a million). Each flip costs $(2^{20})^2 = 2^{40}$: roughly a trillion field multiplications. The flips are the bottleneck.

    Unless you pick the right evaluation points.


    So What Are Roots of Unity?

    The 8-pointed Star of Chaos from Warhammer
    The 8 points of this Chaos star are just $\omega^0$ to $\omega^7$. Change my mind.
    On the complex plane, multiplying by $\omega$ rotates a point 45° around the unit circle. After 8 steps, the cycle wraps: $\omega^8$ lands back on $\omega^0$. In a finite field, nothing visually rotates, but the algebraic cycle is the same.

    For the 8 roots of unity, you start with a number $\omega$ (omega). Then $\omega^0 = 1, \omega^1, \omega^2, \ldots, \omega^7$ gives you 8 distinct values, and $\omega^8 = 1$ again.

    The animations show this on the complex plane, where roots of unity sit on a circle and multiplying by $\omega$ is a visible rotation. That’s the intuition.

    In a finite field, there’s no circle: $\omega$ is a single integer (a big one), and “multiply by $\omega$” is just modular multiplication. Nothing rotates. But the algebra is the same: N distinct values that cycle back to 1, with identical symmetries. The circle helps you see the structure. The code below is how roots of unity actually look in ZK.

    But in many zkSNARK frameworks and libraries you will see a thing named omega and an operation called rotation. This makes no sense for integers. The name comes from the complex plane, where it does make sense. This is why those operations and primitives are named that way.

    Click Run.

    Those numbers are huge (welcome to finite fields), but the punchline is at the bottom: $\omega^0$ and $\omega^8$ are the same value. The cycle is the same as in the complex plane, and it also has exactly 8 distinct values.

    One more thing, and this one matters later. The polynomial $z^8 - 1$ equals zero at every root of unity. That gives us a way to check constraints across all rows of a trace at once (post #5). Here’s why it works.

    The “root” in “root of unity” means the same as in “root of an equation”: a number that makes the equation equal zero. Every root of unity satisfies $\omega^8 = 1$, or equivalently $\omega^8 - 1 = 0$. Not just one $\omega$, but every root:

    $$\begin{gathered} (\omega^0)^8 = 1 \\ (\omega^1)^8 = 1 \\ (\omega^2)^8 = 1 \\ \vdots \\ (\omega^7)^8 = 1 \end{gathered}$$

    That's the definition. Why? Because you can reorder the exponents:

    $$(\omega^2)^8 = (\omega^8)^2 = 1^2 = 1$$

    So $z^8 - 1$ has all eight roots of unity as its roots. No matter which $\omega^k$ you plug in, you get 0. We can also use these roots as the x coordinates for encoding data into polynomials, instead of picking points by hand like we did in post #1.


    How Does This Speed Things Up?

    Take a polynomial $P(x)$ and split it by even and odd coefficients:

    $$P(x) = P_{\text{even}}(x^2) + x \cdot P_{\text{odd}}(x^2)$$

    For example, take a degree-7 polynomial with 8 coefficients:

    $$P(x) = 3 + 1x + 4x^2 + 1x^3 + 5x^4 + 9x^5 + 2x^6 + 6x^7$$

    Pull out the even-index coefficients (positions 0, 2, 4, 6) and the odd-index coefficients (positions 1, 3, 5, 7):

    $$\begin{aligned} P_{\text{even}}(y) &= 3 + 4y + 5y^2 + 2y^3 \\ P_{\text{odd}}(y) &= 1 + 1y + 9y^2 + 6y^3 \end{aligned}$$

    Plug in $y = x^2$ and recombine:

    $$P(x) = \underbrace{(3 + 4x^2 + 5x^4 + 2x^6)}_{P_{\text{even}}(x^2)} + x \cdot \underbrace{(1 + 1x^2 + 9x^4 + 6x^6)}_{P_{\text{odd}}(x^2)}$$

    One degree-7 polynomial just became two degree-3 polynomials. Each of those can be split the same way. Split $P_{\text{even}}$ by its even/odd coefficients:

    $$\begin{aligned} P_{\text{even\_even}}(y) &= 3 + 5y \\ P_{\text{even\_odd}}(y) &= 4 + 2y \end{aligned}$$

    And split $P_{\text{odd}}$ the same way:

    $$\begin{aligned} P_{\text{odd\_even}}(y) &= 1 + 9y \\ P_{\text{odd\_odd}}(y) &= 1 + 6y \end{aligned}$$

    Four degree-1 polynomials. Split once more. Each has two coefficients, so the split produces eight degree-0 polynomials:

    $$\begin{aligned} P_{\text{even\_even\_even}}(y) &= 3 &\qquad P_{\text{even\_even\_odd}}(y) &= 5 \\ P_{\text{even\_odd\_even}}(y) &= 4 &\qquad P_{\text{even\_odd\_odd}}(y) &= 2 \\ P_{\text{odd\_even\_even}}(y) &= 1 &\qquad P_{\text{odd\_even\_odd}}(y) &= 9 \\ P_{\text{odd\_odd\_even}}(y) &= 1 &\qquad P_{\text{odd\_odd\_odd}}(y) &= 6 \end{aligned}$$

    Eight constants. “Evaluation” of a degree-0 polynomial is just returning the number. That's the base case.

    If you’re lost in the subscripts: that’s normal. We’re doing one thing: splitting a big problem into smaller identical problems. Like merge sort, but for polynomials. $P_{\text{even\_odd}}$ just means “take the even half, then the odd half of that.” The name tracks the path down the tree.

    The full cascade: $8 \to 4 \to 2 \to 1$ coefficients per sub-polynomial, $\log_2(8) = 3$ levels of splitting.

    Now the computation runs in reverse, from the bottom up. At each level, the butterfly combines two values $a$ and $b$ into $a + \omega^k \cdot b$ and $a - \omega^k \cdot b$.

    That $\omega^k$ multiplier changes at each level. At Level 1 it’s just 1, so the butterfly simplifies to plain $a + b$ and $a - b$. At Level 2 it alternates between $1$ and $\omega^2$. At Level 3 it cycles through $1, \omega, \omega^2, \omega^3$. This multiplier has a name: twiddle factor. Just a power of $\omega$ that gets more varied as you go up.

    The next three sections walk through the butterfly computation level by level. If you trust the pattern after Level 1, you can skip to the result.

    Level 1: evaluate degree-1 polynomials at 2nd roots ($\omega^k = 1$)

    $$\begin{aligned} P_{\text{even\_even}}(\omega^0) &= 3 + 1 \cdot 5 = 8 &\qquad P_{\text{even\_even}}(\omega^4) &= 3 - 1 \cdot 5 = -2 \\ P_{\text{even\_odd}}(\omega^0) &= 4 + 1 \cdot 2 = 6 &\qquad P_{\text{even\_odd}}(\omega^4) &= 4 - 1 \cdot 2 = 2 \\ P_{\text{odd\_even}}(\omega^0) &= 1 + 1 \cdot 9 = 10 &\qquad P_{\text{odd\_even}}(\omega^4) &= 1 - 1 \cdot 9 = -8 \\ P_{\text{odd\_odd}}(\omega^0) &= 1 + 1 \cdot 6 = 7 &\qquad P_{\text{odd\_odd}}(\omega^4) &= 1 - 1 \cdot 6 = -5 \end{aligned}$$

    All additions and subtractions. The twiddle factor at this level is just 1.

    Still with me? Good. Same operation at every level, just b multiplier changes.

    Level 2: combine Level 1 results into degree-3 polynomials (twiddle factors: $1$ and $\omega^2$)

    Same butterfly as Level 1, one difference: the twiddle factor is no longer always 1.

    $$\begin{aligned} P_{\text{even}}(\omega^0) &= 8 + 1 \cdot 6 = 14 &\qquad P_{\text{even}}(\omega^4) &= 8 - 1 \cdot 6 = 2 \\ P_{\text{even}}(\omega^2) &= -2 + \omega^2 \cdot 2 &\qquad P_{\text{even}}(\omega^6) &= -2 - \omega^2 \cdot 2 \end{aligned}$$ $$\begin{aligned} P_{\text{odd}}(\omega^0) &= 10 + 1 \cdot 7 = 17 &\qquad P_{\text{odd}}(\omega^4) &= 10 - 1 \cdot 7 = 3 \\ P_{\text{odd}}(\omega^2) &= -8 + \omega^2 \cdot (-5) &\qquad P_{\text{odd}}(\omega^6) &= -8 - \omega^2 \cdot (-5) \end{aligned}$$

    Same butterfly, but now the twiddle factor alternates between $1$ and $\omega^2$. Where it's 1, we get clean numbers. Where it's $\omega^2$, the expression stays symbolic, but in a finite field $\omega$ is a concrete integer, so these all resolve to specific values.

    Level 3: combine Level 2 results into the full degree-7 polynomial (twiddle factors: $1$, $\omega$, $\omega^2$, $\omega^3$)

    Last level. Four different twiddle factors now, but the operation is still the same: $a + \text{twiddle} \cdot b$ and $a - \text{twiddle} \cdot b$.

    $$\begin{aligned} P(\omega^0) &= 14 + 1 \cdot 17 = 31 &\qquad P(\omega^4) &= 14 - 1 \cdot 17 = -3 \\ P(\omega^1) &= P_{\text{even}}(\omega^2) + \omega \cdot P_{\text{odd}}(\omega^2) &\qquad P(\omega^5) &= P_{\text{even}}(\omega^2) - \omega \cdot P_{\text{odd}}(\omega^2) \\ P(\omega^2) &= 2 + \omega^2 \cdot 3 &\qquad P(\omega^6) &= 2 - \omega^2 \cdot 3 \\ P(\omega^3) &= P_{\text{even}}(\omega^6) + \omega^3 \cdot P_{\text{odd}}(\omega^6) &\qquad P(\omega^7) &= P_{\text{even}}(\omega^6) - \omega^3 \cdot P_{\text{odd}}(\omega^6) \end{aligned}$$

    All eight evaluations, four twiddle factors. Each line: left and right share the same computation, the right just flips the sign.

    If you made it through those three levels, you just survived the hardest part of this post. Everything after this is downhill.

    Why “butterfly”? Look at each crossing in the diagram below: two lines cross and diverge, like a pair of wings. If you squint hard enough, you can see it. It would be a shame to waste such a cool name.

    Here’s the full butterfly for our polynomial. On the left: coefficients a₀ through a₇ (shuffled by the even/odd splitting). On the right: the polynomial evaluated at each root of unity, P(ω⁰) through P(ω⁷).

    Stage 1 Stage 2 Stage 3 3542 1916 a₀a₄a₂a₆ a₁a₅a₃a₇ 8 −2 6 2 10 −8 7 −5 14 −2+2ω² 2 −2−2ω² 17 −8−5ω² 3 −8+5ω² P(ω⁰)=31P(ω¹) P(ω²)P(ω³) P(ω⁴)=−3P(ω⁵) P(ω⁶)P(ω⁷) 1 1 1 1 1 1 1 1 Peven_even Peven_odd Podd_even Podd_odd Peven Podd P 1 1 ω² ω² 1 1 ω² ω² 1 1 ω ω ω² ω² ω³ ω³ Each crossing: top = a + ωᵏ·b, bottom = a − ωᵏ·b

    The pattern widens at each stage: stage 1 combines adjacent pairs (blue), stage 2 reaches across two (orange), stage 3 across four (green). At every crossing, two inputs produce two outputs. Same computation, just + and −.

    For our 8-coefficient polynomial, that's 3 stages of 8 operations each. $N \times \log_2 N$ total instead of $N^2$. That's $O(N \log N)$ vs $O(N^2)$.

    Why does the halving work? On the complex plane you can see it: $\omega^1$ and $\omega^5$ sit opposite each other on the circle, so squaring both gives the same result: $(\omega^1)^2 = (\omega^5)^2 = \omega^2$. Eight roots collapse to four distinct squared values, four to two, two to one. The code below shows this cascade in a finite field, where there's no circle but the algebra is the same.

    Here's the full cascade: all 8 roots of unity, squaring down level by level. Each level, paired roots produce the same squared value, halving the distinct values.

    Three levels of halving, so half the computation at each level is shared with the level above.

    Here's the math for a real ZK domain:

    $N = 2^{20}$ (1,048,576 points, a common circuit size):

    $$\begin{aligned} \text{Naive:} &\quad N^2 = 2^{40} \approx 1.1 \text{ trillion field multiplications} \\ \text{NTT:} &\quad N \times \log_2 N = 2^{20} \times 20 \approx 21 \text{ million} \\ \text{Speedup:} &\quad \sim 50{,}000 \times \end{aligned}$$

    $N = 2^{25}$ (≈ 33 million points, large production circuits):

    $$\begin{aligned} \text{Naive:} &\quad N^2 = 2^{50} \approx 10^{15} \\ \text{NTT:} &\quad N \times \log_2 N = 2^{25} \times 25 \approx 839 \text{ million} \\ \text{Speedup:} &\quad > 1{,}000{,}000 \times \end{aligned}$$

    Here are both approaches side by side, step by step, counting every field multiplication. First, the naive way: for each root of unity, walk all coefficients and compute $P(x) = a_0 + a_1 x + a_2 x^2 + \ldots$ Click Run.

    Now the same polynomial, same roots, but using the butterfly. Split into even/odd, recurse, combine with + and -. Count the multiplications.

    Same results, different cost. For $N = 8$ the naive approach uses 127 field multiplications, the butterfly uses 17. That’s already 7x fewer, and the ratio keeps growing with $N$.

    Roots of unity turn $O(N^2)$ into $O(N \log N)$. For a real ZK circuit with a million points, that’s the difference between “takes a year” and “takes a second.”

    This algorithm is the Fast Fourier Transform (FFT). When it runs over finite fields instead of complex numbers, it’s called the Number Theoretic Transform (NTT). Same structure, same recursion, just different arithmetic.

    One more thing: the recursive halving needs to split cleanly at every level ($N \to N/2 \to N/4 \to \cdots \to 1$). That only works if $N$ is a power of two.

    This is why every ZK framework uses domain sizes like $2^{16}$ (65,536) or $2^{20}$ (1,048,576). This is consequence of the algorithm. When a ZK tutorial says “pad your trace to a power of two,” now you know why. You can see that in execution trace post.

    The field itself also needs to support this: $p - 1$ must be divisible by large powers of 2 so the recursive halving has room to work. This is called “2-adicity.”

    BN254 has a 2-adicity of 28, meaning it supports NTT domains up to $2^{28}$ (about 268 million points). Plenty for most ZK circuits.


    Let's See It Work

    Enough theory. Let's use roots of unity for what they're actually for in ZK.

    Remember post #1? We stored data inside a polynomial by choosing x values and evaluating.

    Roots of unity give us those x values for free: $\omega^0, \omega^1, \ldots, \omega^{n-1}$. You assign each data value to a root, run the inverse NTT, and out come the polynomial's coefficients. That's interpolation in $O(N \log N)$.

    This is what a ZK prover does with the execution trace to get the data in polynomial form (next post).

    The inverse NTT took four ASCII values and found a polynomial that encodes them. We verified it the old-fashioned way: by calling evaluate() at $\omega^0, \omega^1, \omega^2, \omega^3$ and getting back 99, 111, 111, 108. Same evaluate() from post #1, but now the x values are roots of unity instead of arbitrary numbers.

    The EvaluationDomain trait covers more operations beyond FFT/iFFT, but these two are the core.


    What You Just Did

    You generated roots of unity over a finite field, saw why their symmetry enables $O(N \log N)$ evaluation, and used the inverse NTT to interpolate a polynomial from raw data. Same evaluate() from post #1, but with roots of unity as the x values.

    “Roots of unity”? It is just another name. “Unity” means 1 (the multiplicative identity), “roots” means solutions to an equation. So the whole thing literally means “solutions to $z^n = 1$.”

    Leonhard Euler studied them in 1751 in Recherches sur les racines imaginaires des équations. Carl Friedrich Gauss wrote 74 pages formalizing the theory in 1801 in Disquisitiones Arithmeticae, Section VII. The name comes from their work. Now you've used the tool behind it.


    Next Up: Execution Trace

    We have our fast polynomial engine. Now: how do we represent an actual program, with all its steps and state, as polynomials?

    That's the execution trace. See you there.

    If you want to discuss this with other Rust devs learning ZK, join the Discord.


    Playground

    This editor is interactive. Copy code from above, modify it, break it, experiment. Try generating the 16th roots of unity. Or change the data values and check if interpolation still works.


    What stuck with me the most: when I first saw $\omega$ and “rotation” in ZK code, I had no idea what was going on. It felt like geometry, not finite fields.

    In geometry, Greek letters usually name angles, and rotating by angle $\omega$ makes perfect sense. In integers modulo a prime? Not so much.

    But then I started looking for roots of unity in other places in mathematics and found them in FFT. And I figured out that (yet again) these are just names. They convey some meaning, but they're also misleading if you take them literally. When you know the origin story, it kinda makes sense why they're called roots of unity even for finite fields.

    Questions? Found a bug? Hit me up on Discord. Code should work: if it doesn't, that's on me to fix.

    Halo2 Cheat Sheet

    API patterns, security pitfalls, fork comparison. One-page PDF reference for halo2 development.

    Get the Cheat Sheet

    Want a structured path to ZK?

    1:1 mentoring for Rust developers learning zkSnarks the right way, with working code, expert feedback, and clear progression.

    See the 1:1 Mentoring
  • Polynomials in zk snarks

    Listen to this post (~6 min)

    Welcome, fellow zk-enjoyer.

    You’ve spent evenings staring at tutorials that don’t compile, explanations that assume you have a PhD in algebra, and AI-generated content that looks legit until you run the code.

    I’ve been in blockchain since 2016. I’ve written production zk code. This guide teaches polynomials the way I wish someone taught me.

    No PhD required. Code that runs. Let’s go.

    (more…)