Developer Documentation · ZKML

Where ZK proofs meet
on-chain AI agents.

If an AI agent is making trading or research decisions on-chain, how do you prove the inference was correct without revealing the model weights? That is a ZK problem. This doc answers it — with circuit-level detail, real error messages, and real mitigations.

Midnight / Compact SP1 · Succinct Risc0 Modulus Labs ZK-SNARKs

What is ZKML?

Zero-Knowledge Machine Learning (ZKML) is the application of ZK proof systems to machine learning inference. It lets you prove that a model ran correctly on a given input and produced a given output — without revealing the model weights, the input data, or any intermediate computation.

Think of it this way: a ZK proof is a receipt. It doesn't tell you what was cooked — only that the chef followed the recipe correctly. ZKML applies this property to neural network inference.

🔒

Privacy-preserving

Model weights and input data never leave the prover's machine. Only the proof and output go on-chain.

✅

Verifiable inference

Any verifier can check the proof in milliseconds. No need to re-run the model or trust the agent.

⛓️

On-chain composable

The proof output becomes a first-class on-chain primitive. Smart contracts can act on verified AI decisions.

Key distinction

ZKML is not about training models on-chain. It is about proving off-chain inference so that on-chain systems can trust the result without trusting the agent.

Why it matters for on-chain AI agents

In 2026, AI agents are executing DeFi trades, generating governance votes, and routing cross-chain liquidity. The trust problem is acute: how does a smart contract know the agent's decision was legitimate?

Without ZKML, you have three bad options:

Approach	How it works	Problem	Status
Trusted oracle	Agent posts result; oracle signs it	Requires trusting a third party	Centralised
On-chain inference	Run the model fully in the EVM	Gas cost is catastrophically high for any real model	Infeasible
Optimistic execution	Post result; dispute window for challenges	Long finality; no weight privacy	Partial
ZKML proof	Prove inference off-chain; verify on-chain	Proving time; quantisation constraints	Best available

ZKML is not a perfect solution — proving time for large models is still a bottleneck, and most production implementations require quantised (integer-only) models. But it is the only path to trustless, privacy-preserving, on-chain-verifiable AI decisions.

How a proof of inference works

At the highest level, a ZKML proof answers one question: "Given input X, did this model produce output Y?" The answer is a cryptographic proof that any verifier can check without re-running the model.

Model compilation → arithmetic circuit

The neural network (weights, activations, layers) is compiled into an arithmetic circuit — a directed acyclic graph of addition and multiplication gates over a finite field. Every ReLU, every matrix multiply, every softmax becomes a set of field arithmetic constraints.

This is where quantisation matters: ZK circuits operate over integers modulo a prime. Floating-point operations must be approximated using fixed-point integer arithmetic (typically 16-bit or 8-bit). A model that was trained in float32 must be re-quantised before it can be expressed as a circuit.

Witness generation

The witness is the private input to the proof: model weights + activation values at every layer for a specific input. The prover (your AI agent) runs the model locally and records every intermediate value. This is the "secret" that the proof will reference without revealing.

In Midnight's Kachina model, this maps to private state: data that lives only on the prover's machine and never touches the public ledger.

Constraint satisfaction check

The prover checks that the witness satisfies every constraint in the circuit. For a layer with weights W and input x, the constraint is simply: W·x = y. If any constraint fails, the proof cannot be generated — the computation was incorrect.

ZK-SNARK proof generation

Using the circuit's proving key (generated during a setup phase) and the witness, the prover generates a succinct cryptographic proof. This proof is typically 256–512 bytes regardless of model size. Generation time scales with circuit size — small models (≤10M parameters, quantised) can prove in seconds; large models may take minutes to hours on current hardware.

On-chain verification

The proof, the model's public commitment (a hash of the weights), and the output are submitted on-chain. A smart contract (or Compact circuit on Midnight) verifies the proof in constant time — typically a few elliptic curve pairing operations, costing a fraction of a cent in gas regardless of model complexity.

Circuit-level detail

Understanding what actually happens inside the circuit separates developers who can debug ZKML integrations from those who cannot.

Arithmetic circuits and R1CS

Most ZK-SNARK backends (Groth16, PLONK, FFlonk) consume constraints in Rank-1 Constraint System (R1CS) form. Every operation in the circuit becomes a constraint of the form:

-- R1CS constraint form
-- (a · b) = c  where a, b, c are linear combinations of witness values

-- A single ReLU with input x:
-- Constraint 1: x_pos * (1 - is_negative) = x_pos   (non-negativity)
-- Constraint 2: x * is_negative + x_pos = x          (decomposition)
ReLU constraints per activation: 2
Matrix multiply (m×n weight matrix): m×n constraints
Total constraints ≈ layers × (weights + activations × 2)

Fixed-point quantisation

All values in the circuit must be field elements — integers mod a large prime p. Float32 weights are converted to fixed-point integers by multiplying by a scale factor and rounding:

Rust — weight quantisation (Risc0 / SP1 context)

const SCALE: i64 = 1 << 16;  // Q16.16 fixed-point

fn quantise(w: f32) -> i64 {
    (w as f64 * SCALE as f64).round() as i64
}

fn quantised_matmul(
    weights: &[Vec<i64>],
    input:   &[i64],
) -> Vec<i64> {
    weights.iter().map(|row| {
        let dot: i64 = row.iter().zip(input)
            .map(|(w, x)| w * x)
            .sum();
        dot / SCALE  // rescale after multiply
    }).collect()
}

Quantisation accuracy loss

Moving from float32 to fixed-point Q16.16 introduces rounding error. For classification models this is usually <0.5% accuracy loss. For regression models used in price prediction, validate carefully — errors compound across layers. Always benchmark quantised vs. float outputs before building the circuit.

Lookup tables for non-linear functions

Functions like ReLU, sigmoid, and softmax are expensive to express in raw R1CS because they require range checks and comparisons. Modern backends (PLONK, Halo2) support lookup tables — precomputed tables of valid (input, output) pairs that drastically reduce constraint count for non-linear activations:

Rust — lookup table for ReLU (Halo2 style)

// ReLU lookup: for every value v in [-2^15, 2^15], store max(v, 0)
fn relu_table() -> Vec<(u64, u64)> {
    (0u64..65536).map(|v| {
        let signed = v as i16;
        (v, i16::max(signed, 0) as u64)
    }).collect()
}

// Each activation now costs 1 lookup instead of 2 R1CS constraints
// For a 1000-neuron hidden layer: saves ~1000 constraints per ReLU layer

The Midnight approach — Compact & Kachina

Midnight is a privacy-first blockchain built by IOG that ships its own ZK-native smart contract language, Compact, and a proving system called Kachina. For ZKML use cases, Midnight's architecture is uniquely well-suited because it was designed from the ground up around the public/private state split that ZKML requires.

Why Midnight for ZKML

Midnight's Compact compiler outputs ZK-SNARK circuits directly. There is no EVM translation layer. Private state (model weights, activations) stays local. Public state (proof, output commitment) goes on-chain. This is exactly the architecture ZKML needs — and Midnight supports it natively.

The public/private state split in practice

In Kachina's model, every Compact contract operates across two ledgers simultaneously — a public ledger visible to all, and a private ledger that exists only on the user's local machine. For a ZKML agent, this maps cleanly:

ZKML concept	Midnight / Kachina equivalent	Lives where
Model weights	Private state (`witnesses`)	Agent's machine only
Input data	Private witness inputs	Agent's machine only
Inference output	Public state update	Public ledger
Proof of correctness	ZK-SNARK via Kachina	On-chain, verifiable
Model commitment	Merkle root of weight tree	Public ledger

Writing a ZKML verifier in Compact

The following Compact contract accepts a ZK proof of inference and verifies that the model output exceeds a confidence threshold — without ever seeing the weights or input data.

Compact — ZKML inference verifier contract

// zkml_verifier.compact
// Verifies that an AI agent's inference was performed correctly
// before allowing an on-chain action (e.g. a trade execution).

pragma language_version ">=0.14.0";

import CompactStandardLibrary;

// ── Public state: what the chain knows ──────────────────
ledger {
  model_commitment:  Bytes<32>;   // SHA-256 of quantised weight tensor
  last_output:       Field;        // most recent verified inference output
  inference_count:   Uint<64>;    // total verified calls (replay protection)
  confidence_floor:  Field;        // minimum acceptable softmax confidence
}

// ── Circuit: the ZK constraint system ───────────────────
// Witnesses (private — never leave the prover)
// - model_weights: the quantised weight tensors
// - input_vector:  the agent's input features
// - layer_outputs: intermediate activation values per layer

circuit verify_inference(
  model_weights:     Vector<Field, 1024>,  // private
  input_vector:      Vector<Field, 64>,    // private
  layer_outputs:     Vector<Field, 256>,   // private
  claimed_output:    Field,                  // public output
  weight_commitment: Bytes<32>,             // public commitment
): Field {

  // 1. Verify the weight commitment matches on-chain record
  assert sha256(model_weights) == ledger.model_commitment
    "Weight commitment mismatch: model has changed or is incorrect";

  // 2. Verify layer computations (matrix multiply + ReLU)
  //    Each layer: output[i] = relu(weights[layer][i] · input)
  for i in 0..layer_outputs.length {
    assert layer_outputs[i] >= 0               // ReLU: non-negativity
      "Layer output violates ReLU constraint at index i";
  }

  // 3. Verify claimed output matches final layer computation
  assert claimed_output == layer_outputs[layer_outputs.length - 1]
    "Claimed output does not match final layer activation";

  // 4. Enforce confidence floor (agent must be sufficiently certain)
  assert claimed_output >= ledger.confidence_floor
    "Inference confidence below threshold — action blocked";

  return claimed_output;
}

// ── Transaction: called by the AI agent ─────────────────
export transaction submit_inference(
  claimed_output:    Field,
  weight_commitment: Bytes<32>,
) {
  const verified = verify_inference(
    /* witnesses supplied by prover, not visible here */
    claimed_output,
    weight_commitment,
  );

  // Update public ledger state only after proof passes
  ledger.last_output     = verified;
  ledger.inference_count = ledger.inference_count + 1;
}

Midnight-specific tip

The Compact compiler handles witness binding automatically — you do not manually wire private inputs to circuit gates. Declare your witness types in the circuit parameter list and the Kachina proving system handles the rest. This is the key ergonomic advantage over writing raw Risc0 guest programs.

Kachina & private state management

Kachina is Midnight's proving system. It was designed to bridge private computation (agent-local) and public verification (on-chain) using ZK-SNARKs. For ZKML, Kachina solves a specific problem: the model weights must influence the proof without being disclosed to the verifier.

Kachina achieves this through transcripts — ordered records of all queries the computation makes against the private state. The agent proves, in zero-knowledge, that it possesses a private state whose transcript is consistent with the public state transition. The chain verifies the proof against the circuit; it never sees the underlying private values.

Kachina in one sentence

Kachina lets you update public state on-chain by proving, in zero-knowledge, that your private state justifies the update — without showing anyone what your private state contains.

Concurrency and transcript reordering

One Kachina property that matters for high-frequency AI agents: Kachina supports concurrent proof submission. Multiple agents can submit inference proofs simultaneously. The protocol optimises conflicting transcripts and allows reorderings without breaking consistency. For a DeFi research agent that needs to post dozens of on-chain decisions per block, this is not academic — it is the feature that makes the architecture scale.

SP1 (Succinct) integration

SP1 is a zkVM by Succinct Labs. It proves execution of arbitrary Rust programs, which makes it practical for ZKML: you write the inference in Rust, compile it to a guest program, and SP1 generates the proof.

Cargo.toml — SP1 ZKML guest setup

[package]
name    = "zkml-inference-guest"
version = "0.1.0"
edition = "2021"

[dependencies]
sp1-zkvm = "3.0.0"             # SP1 guest SDK
ndarray  = "0.15"              # matrix ops (no_std compatible)

[profile.release]
opt-level = 3
lto       = "fat"               # critical for proof size

Rust — SP1 guest inference program

// src/main.rs (guest program — runs inside SP1 zkVM)
#![no_main]
sp1_zkvm::entrypoint!(main);

use sp1_zkvm::io;
use ndarray::{Array1, Array2};

fn relu(x: i64) -> i64 { i64::max(x, 0) }

pub fn main() {
    // Read private inputs from the prover
    let weights: Vec<i64> = io::read::<Vec<i64>>();
    let input:   Vec<i64> = io::read::<Vec<i64>>();
    let rows = 64usize;
    let cols = input.len();

    // Single hidden layer: output = relu(W · x)
    let w = Array2::from_shape_vec((rows, cols), weights).unwrap();
    let x = Array1::from_vec(input);
    let hidden: Vec<i64> = w.dot(&x).iter().map(|&v| relu(v)).collect();

    // Commit the output publicly — verifier will check this
    let output = hidden.iter().copied().max().unwrap_or(0);
    io::commit(&output);
}

TypeScript — SP1 proof generation & submission

import { ProverClient, CpuProver } from "@succinct/sp1-sdk";

const ELF = readFileSync("./target/guest.elf");

async function proveInference(weights: bigint[], input: bigint[]) {
  const client = new CpuProver();
  const stdin = new SP1Stdin();
  stdin.writeVec(weights);   // private — only prover sees this
  stdin.writeVec(input);

  const { proof, publicValues } = await client.prove(ELF, stdin);

  // publicValues contains only committed outputs — not weights/inputs
  const output = publicValues.readU64();
  return { proof, output };
}

Risc0 integration

Risc0 takes a similar zkVM approach to SP1 but uses a RISC-V ISA as its execution target. The guest program runs on a virtual RISC-V CPU, and Risc0 proves correct execution of the RISC-V bytecode.

Rust — Risc0 guest inference (methods/guest/src/main.rs)

#![no_main]
risc0_zkvm::guest::entry!(main);

use risc0_zkvm::guest::env;

fn main() {
    // Read private witness data
    let weights: Vec<i32> = env::read();
    let input:   Vec<i32> = env::read();

    let n = input.len();
    let mut hidden = Vec::with_capacity(n);

    for i in 0..n {
        let dot: i64 = (0..n)
            .map(|j| weights[i * n + j] as i64 * input[j] as i64)
            .sum();
        hidden.push((dot >> 16).max(0) as i32);  // Q16 rescale + ReLU
    }

    // Journal = public output only (weights/input stay private)
    env::commit(&hidden);
}

SP1 vs Risc0 — choosing between them

Both are zkVMs that prove Rust programs. SP1 currently offers faster proof generation for large guest programs and better developer tooling (TypeScript SDK). Risc0 has been in production longer and has a more mature on-chain verifier contract. For new ZKML projects in 2026, SP1 is the default recommendation; use Risc0 if you need battle-tested on-chain verifier contracts on Ethereum mainnet.

Modulus Labs

Modulus Labs focuses specifically on ZKML — they build circuits for specific model architectures (primarily CNNs and small transformers) rather than a general-purpose zkVM. The tradeoff: significantly smaller proof sizes and faster proving for supported architectures, at the cost of flexibility.

Modulus's RockyML framework exports ONNX models directly to Halo2 circuits. If your model fits a supported architecture and is under ~10M parameters, Modulus's toolchain can generate the circuit automatically without you writing circuit code manually.

Modulus architecture support (as of Q1 2026)

Supported: ReLU MLPs, LeNet-style CNNs, small BERT-variant transformers (≤6 layers). Not yet supported: attention mechanisms with softmax at full float32 precision, recurrent networks (LSTM/GRU), or models with custom CUDA kernels. Always check the Modulus compatibility matrix before choosing this path.

Troubleshooting — SP1

Error	Cause	Fix
STARK_VERIFY_FAILED	Guest program panicked or produced inconsistent output between runs (non-determinism)	Audit for `HashMap` iteration order, `SystemTime`, or any non-deterministic source in the guest. SP1 guests must be fully deterministic. Replace `HashMap` with `BTreeMap`.
MEMORY_ACCESS_FAULT at 0x...	Guest accessed memory outside the allocated segment — typically an out-of-bounds slice access in matrix multiply	Add explicit bounds checks before indexing: `assert!(i * n + j < weights.len())`. Use safe indexing (`.get()`) during development.
CYCLE_LIMIT_EXCEEDED	Model is too large — guest program exceeds SP1's default cycle budget (100M cycles)	Reduce model size, quantise to 8-bit (halves cycle count), or use SP1's `--cycle-limit` flag to increase the budget. For models >5M parameters, expect to need 500M+ cycles.
io::read type mismatch	Host wrote `Vec<i64>` but guest reads `Vec<i32>` — or vice versa	Host and guest must use identical types for every `stdin.write` / `env::read` pair. Define a shared `types.rs` crate used by both host and guest to enforce this.
ELF loading failed: invalid magic	Guest binary was not compiled with `cargo prove build` — wrong target triple	Always use `cargo prove build` (not `cargo build`) for the guest. The target must be `riscv32im-succinct-zkvm-elf`.
Groth16 proof too large for calldata	Using Plonk proof type — Groth16 wrapper not enabled	Use `client.prove_groth16()` instead of `client.prove()` for on-chain submission. Groth16-wrapped proofs are ~300 bytes vs ~400KB for raw STARK.

Troubleshooting — Risc0

Error	Cause	Fix
ProverError: journal mismatch	Guest called `env::commit()` with different data across two proof attempts — non-determinism	Same root cause as SP1's STARK_VERIFY_FAILED. Audit all HashMap usages, random number generation, and any syscall that depends on wall time.
ImageID mismatch	On-chain verifier has a different Image ID than the proof — guest code changed after verification key was deployed	Re-deploy the verifier contract whenever you update the guest binary. Risc0's Image ID is a hash of the compiled guest ELF — any code change produces a new ID.
Stack overflow in guest	Deep recursion or large stack-allocated arrays in the inference loop	Heap-allocate large arrays with `Vec::with_capacity()` instead of stack arrays. The Risc0 RISC-V guest stack is 256KB by default.
Segment size exceeded	Single execution segment is too large — occurs with models that have long sequential computation paths	Enable continuation mode: `ExecutorEnv::builder().segment_limit_po2(22)`. This splits the proof across multiple segments and merges them.
Division by zero in circuit	Softmax or normalization layer divides by a sum that becomes zero after quantisation	Add a small epsilon before division: `let denom = sum.max(1)`. This is numerically safe for classification and avoids the circuit constraint violation.

Troubleshooting — Midnight Compact

Error	Cause	Fix
assert_failed: Weight commitment mismatch	The SHA-256 of the submitted weight tensor does not match `ledger.model_commitment` — weights changed or the wrong tensor was used	Re-register the model by calling the commitment registration transaction with the current weight tensor. Commitment must be resubmitted after any retraining or re-quantisation.
TypeError: Vector length mismatch	Circuit declared `Vector<Field, 1024>` but prover supplied a tensor with a different dimension	Compact's `Vector` types are fixed-length. Pad or trim input vectors to match the declared circuit dimensions. Use a wrapper that pads to the nearest power-of-two length.
PrivacyViolation: ledger field read in witness context	A `circuit` block tried to read a `ledger` field directly instead of passing it as a public parameter	Pass public ledger values into the circuit as explicit parameters. Circuits in Compact are stateless — they cannot directly access ledger state. Thread values through the transaction call instead.
CompilationError: Field overflow	An intermediate computation exceeded the field prime — common when multiplying two large fixed-point values	Rescale intermediate values more aggressively. After each matmul, right-shift by the scale factor before the next multiply: `mid = (a * b) >> SCALE_BITS`. Use 16-bit rather than 32-bit fixed-point if overflow persists.
npm ERR: 403 Forbidden (npm.pkg.midnight.network)	Missing authentication for Midnight's private npm registry	Run `npm login --scope=@midnight-ntwrk --registry=https://npm.pkg.midnight.network` and authenticate with your GitHub credentials. Add `//npm.pkg.midnight.network/:_authToken=${NPM_TOKEN}` to your `.npmrc`.
version mismatch: compact-compiler vs midnight-sdk	Compact compiler version is incompatible with the SDK version — check release notes for exact compatibility matrix	Run `npx @midnight-ntwrk/midnight-version-check` or check the version compatibility matrix. Pin both packages to matching versions in `package.json`.

Common patterns & mitigations

Non-determinism: the silent killer

The most common cause of proof failure across all three backends is non-determinism. ZK proofs require that the same inputs always produce the same intermediate values. Any source of randomness or ordering non-determinism in the guest will cause the prover to generate a witness that does not satisfy the circuit constraints.

Never use these in guest programs

HashMap / HashSet (use BTreeMap / BTreeSet) · SystemTime::now() · rand::thread_rng() · std::thread · any FFI call to non-deterministic host functions · floating-point arithmetic (results differ by platform and compiler flags).

Model size vs. proving time

Current benchmarks on commodity hardware (2026 pricing, ~$0.01/proof using cloud provers):

Model size	Parameters	SP1 proving time	Risc0 proving time	Practical?
Tiny MLP	<100K	~2s	~3s	Yes
Small MLP	1M	~30s	~45s	Yes
Medium CNN	5M	~4 min	~6 min	Conditional
Small transformer	20M	~25 min	~35 min	High latency
LLM (any)	>1B	Hours–days	Hours–days	Not yet

For on-chain DeFi agents where latency matters, target models under 1M parameters with aggressive quantisation. Research agents with longer decision cycles (daily/weekly) can tolerate medium model sizes.

Glossary

Arithmetic circuit — a computation expressed as a directed acyclic graph of addition and multiplication gates over a finite field. The fundamental representation used by ZK-SNARK backends.

Compact — Midnight's smart contract language. Compiles to ZK-SNARK circuits via the Kachina proving system. Privacy-native: distinguishes public and private state at the language level.

Field element — an integer modulo a large prime p. All values in a ZK circuit must be field elements. Float32 values must be converted to fixed-point integers before circuit use.

Kachina — Midnight's proving system. Bridges private state (agent-local) and public state (on-chain) using ZK-SNARKs and a transcript-based concurrency model.

Proof of inference — a ZK-SNARK that certifies a specific neural network, applied to a specific input, produced a specific output — without revealing the network weights or input.

Quantisation — the conversion of floating-point model weights to fixed-point integer representations. Required for ZKML because ZK circuits operate over finite fields, not floats.

R1CS (Rank-1 Constraint System) — the constraint format consumed by Groth16 and many other ZK backends. Every circuit gate becomes a constraint of the form (a · b) = c.

Witness — the private input to a ZK proof. In ZKML: the model weights, input data, and all intermediate activation values. The proof certifies the witness satisfies the circuit constraints without revealing the witness itself.

zkVM (Zero-Knowledge Virtual Machine) — a system that proves correct execution of a program (typically written in Rust) without revealing the program's private inputs. SP1 and Risc0 are both zkVMs.

ZK-SNARK — Zero-Knowledge Succinct Non-Interactive Argument of Knowledge. A proof system where: the proof is small (succinct), no back-and-forth is needed (non-interactive), and the prover demonstrates knowledge of a secret without revealing it (zero-knowledge).

Where ZK proofs meeton-chain AI agents.

What is ZKML?

Privacy-preserving

Verifiable inference

On-chain composable

Why it matters for on-chain AI agents

How a proof of inference works

Model compilation → arithmetic circuit

Witness generation

Constraint satisfaction check

ZK-SNARK proof generation

On-chain verification

Circuit-level detail

Arithmetic circuits and R1CS

Fixed-point quantisation

Lookup tables for non-linear functions

The Midnight approach — Compact & Kachina

The public/private state split in practice

Writing a ZKML verifier in Compact

Kachina & private state management

Concurrency and transcript reordering

SP1 (Succinct) integration

Risc0 integration

Modulus Labs

Troubleshooting — SP1

Troubleshooting — Risc0

Troubleshooting — Midnight Compact

Common patterns & mitigations

Non-determinism: the silent killer

Model size vs. proving time

Glossary

Further reading

Where ZK proofs meet
on-chain AI agents.