DeFi Agent Pipeline Docs v1.0 · 2026 · Diataxis

Developer Documentation · v1.0 · 2026

The DeFi Research Agent Pipeline

A structured framework for understanding how autonomous research agents ingest on-chain data, score opportunities, and produce verifiable outputs — without writing the agent yourself.

📡 Data Sources ⚙️ Decision Logic 📊 Output Formats 🔍 OpenClawD · AiXT Case Study Diataxis Framework
01
Ingestion
On-chain RPC, DEX feeds, oracle prices, event logs
02
Normalisation
Schema mapping, deduplication, timestamp alignment
03
Scoring
OSF-1 formula, opportunity ranking, risk filters
04
Decision
Strategy selection, threshold gating, confidence checks
05
Output
JSON reports, alerts, dashboards, signed attestations
Concepts

Understanding the DeFi Research Agent Pipeline

The mental models you need before you touch any configuration or code. Start here.

What is a DeFi research agent?

A DeFi research agent is a software system — often AI-augmented — that autonomously monitors decentralised finance protocols, ingests structured data from multiple sources, applies scoring or ranking logic to candidate opportunities, and produces outputs that inform or execute financial decisions.

In 2026, research agents range from simple yield-monitoring bots that compare APRs across lending markets, to multi-modal systems like OpenClawD and AiXT Trading Intelligence that fuse on-chain event streams, sentiment signals, and cross-chain liquidity metrics to surface opportunities within milliseconds of a state change.

The word "research" is deliberate. Unlike execution agents (which transact), a research agent's primary output is information: structured reports, scored opportunity sets, and ranked strategy candidates. A human or an execution layer makes the final call.

Key distinction A research agent observes and ranks. An execution agent acts. This documentation covers research agents only — the layer that turns raw data into investable intelligence.

Why pipeline architecture matters

DeFi data is fast, fragmented, and adversarial. A yield opportunity on Aave might last 40 seconds before arbitrageurs close it. A governance proposal on Compound can shift protocol risk parameters before your scoring model sees it. Without a structured pipeline, agents fail in predictable ways:

  • Stale data decisions — scoring an opportunity based on a price that was valid 12 blocks ago.
  • Source conflation — treating on-chain oracle prices and CEX spot prices as equivalent without normalisation.
  • Scoring drift — ranking logic that uses different risk parameters than the ones the output consumer expects.
  • Silent failures — a feed goes down; the agent keeps scoring on cached data without signalling degraded confidence.

A pipeline architecture solves this by separating concerns into discrete, observable stages: ingestion → normalisation → scoring → decision → output. Each stage has a defined contract with the next. Failures are localised. Confidence degrades explicitly rather than silently.

The five-stage model

StageInputOutputFailure mode
1. IngestionRaw RPC calls, WebSocket events, REST feedsUnvalidated raw recordsConnection loss, rate limiting, provider outage
2. NormalisationRaw recordsSchema-validated, timestamped, deduplicated recordsSchema mismatch, clock skew, duplicate events
3. ScoringNormalised recordsScored, ranked opportunity setMissing fields, division by zero in formula, stale input
4. DecisionScored opportunitiesSelected strategy candidates with confidence labelsAll candidates below threshold, conflicting signals
5. OutputStrategy candidatesJSON report, alert, dashboard update, signed attestationSerialisation error, delivery failure, consumer schema mismatch

Data source taxonomy

DeFi research agents draw from four categories of data. Understanding the latency, reliability, and trust characteristics of each is prerequisite knowledge for designing the ingestion layer.

On-chain data

Data read directly from a blockchain's state or event logs via an RPC endpoint or indexer. The ground-truth layer — cryptographically final after sufficient confirmations.

  • State readseth_call to read protocol parameters, liquidity pool reserves, vault balances, governance states.
  • Event logseth_getLogs for Swap, Deposit, Borrow, LiquidationCall events.
  • Block metadata — block number, timestamp, base fee, proposer (for MEV context).
  • Mempool data — pending transactions via Blocknative or Flashbots.

Latency: 1–12s (block time). Trust: cryptographic. Cost: scales with RPC call volume.

Off-chain data

Data originating outside the blockchain — faster but requires explicit trust decisions.

  • CEX price feeds — Binance, Coinbase, Kraken REST/WebSocket APIs.
  • Macro/news feeds — Bloomberg, Refinitiv, The Block, DeFiLlama news.
  • Social sentiment — Twitter/X, Reddit, Telegram via LunarCrush or Santiment.
  • Protocol risk scores — DeFi Safety, Gauntlet, Chaos Labs risk parameter APIs.

Oracle data

Decentralised price feeds that sit on-chain but aggregate off-chain price data. On-chain in location, off-chain in origin.

  • Chainlink Data Feeds — aggregated spot prices, updated on deviation threshold or heartbeat.
  • Pyth Network — pull-based oracle with sub-second updates, popular on Solana and EVM L2s.
  • Uniswap v3 TWAP — time-weighted average price from on-chain swap history; manipulation-resistant but slow.
  • Redstone — modular oracle supporting custom data types beyond spot price.

Protocol events and governance feeds

  • Governance proposals — Snapshot, Tally, Governor Bravo APIs for pending votes.
  • Parameter changes — collateral factor updates, interest rate model changes, fee tier adjustments.
  • Liquidation thresholds — near-liquidation position tracking via The Graph or protocol-specific APIs.
  • Protocol upgrades and pauses — proxy upgrade events, circuit breaker triggers.

Decision logic model

The decision logic layer has three components: a scoring engine, a ranking layer, and a strategy selector.

The scoring engine applies OSF-1 to produce a numeric score per candidate. The ranking layer sorts by score and applies filters: minimum score threshold, minimum liquidity floor, maximum protocol concentration. The strategy selector matches top-ranked opportunities against a strategy library (e.g., arb.cex-dex, yield.stablecoin-lp, liquidation.bonus).

The trust problem in autonomous agents

Without a trust framework, agents are black boxes. The pipeline addresses this with three mechanisms:

  1. Confidence labels — every output includes a confidence field derived from data completeness and source reliability.
  2. Data lineage — every output includes a sources array listing which feeds contributed, with freshness timestamps.
  3. Degradation signalling — if any required feed is stale beyond a threshold, output status is set to degraded rather than silently omitting data.
ZKML note In 2026, some research agents expose ZK proofs of inference alongside outputs — cryptographic attestations that scoring was performed correctly on the claimed inputs. This is the ZKML pattern. Not required by this spec, but the direction high-stakes autonomous decision-making is heading.

Output model

The pipeline supports four output formats, producible simultaneously:

  • JSON report — the canonical output. All scored opportunities, strategy candidates, data lineage, and confidence metadata.
  • Alert — a lightweight notification (webhook, Telegram, Slack) when a scored opportunity exceeds a threshold.
  • Dashboard — a real-time web view backed by a WebSocket connection to the agent's scoring output stream.
  • Signed attestation — a cryptographically signed summary for downstream on-chain verification or audit trails.
Guides

Simulating the Research Agent Pipeline

Step-by-step instructions for configuring and running each stage of a DeFi research agent pipeline, plus a full case study.

Guide 1 — Setting up the ingestion layer

The ingestion layer pulls raw data from every configured source and places it into the normalisation queue. It runs continuously and must handle partial failures gracefully.

01

Configure your RPC providers

Every ingestion layer needs at least two RPC endpoints per chain — a primary and a fallback. Single-provider setups fail silently when the provider rate-limits or goes offline.

# config/providers.yaml
chains:
  ethereum:
    primary:   "wss://eth-mainnet.g.alchemy.com/v2/${ALCHEMY_KEY}"
    fallback:  "https://rpc.ankr.com/eth/${ANKR_KEY}"
    archive:   "https://eth-mainnet.g.alchemy.com/v2/${ALCHEMY_KEY}"
    block_time_ms: 12000
  arbitrum:
    primary:   "wss://arb-mainnet.g.alchemy.com/v2/${ALCHEMY_KEY}"
    fallback:  "https://rpc.ankr.com/arbitrum/${ANKR_KEY}"
    block_time_ms: 250
  base:
    primary:   "wss://base-mainnet.g.alchemy.com/v2/${ALCHEMY_KEY}"
    fallback:  "https://mainnet.base.org"
    block_time_ms: 2000
02

Define protocol event subscriptions

Use a declarative subscription file to specify which protocol events the ingestion layer should monitor. Each subscription maps a contract address and event signature to a data handler.

# config/subscriptions.yaml
subscriptions:
  - id: "aave_v3_borrow"
    chain: "ethereum"
    address: "0x87870Bca3F3fD6335C3F4ce8392D69350B4fA4E2"
    event:   "Borrow(address,address,address,uint256,uint8,uint256,uint16)"
    handler: "handlers.lending.on_borrow"
    priority: high

  - id: "uniswap_v3_swap"
    chain: "arbitrum"
    address: "*"  # wildcard: all UniV3 pools
    event:   "Swap(address,address,int256,int256,uint160,uint128,int24)"
    handler: "handlers.amm.on_swap"
    priority: medium

  - id: "compound_v3_liquidation"
    chain: "ethereum"
    address: "0xc3d688B66703497DAA19211EEdff47f25384cdc3"
    event:   "AbsorbCollateral(address,address,address,uint256,uint256)"
    handler: "handlers.lending.on_liquidation"
    priority: critical
03

Configure off-chain feeds

Off-chain feeds are polled on a schedule rather than subscribed to. Set a ttl_seconds for each — the normalisation layer uses this to mark data as stale and downgrade confidence scores.

# config/feeds.yaml
feeds:
  - id: "binance_spot"
    type: "websocket"
    url: "wss://stream.binance.com:9443/ws/!ticker@arr"
    ttl_seconds: 30
    normaliser: "normalisers.cex.binance"

  - id: "defillama_tvl"
    type: "rest_poll"
    url: "https://api.llama.fi/v2/protocols"
    poll_interval_seconds: 120
    ttl_seconds: 300
    normaliser: "normalisers.analytics.defillama"

  - id: "chainlink_eth_usd"
    type: "onchain_oracle"
    chain: "ethereum"
    address: "0x5f4eC3Df9cbd43714FE2740f5E3616155c5b8419"
    heartbeat_seconds: 3600
    ttl_seconds: 4200
    normaliser: "normalisers.oracle.chainlink"

Guide 2 — Normalising on-chain and off-chain data

Normalisation converts heterogeneous raw records into a common schema. The three most critical operations are timestamp alignment, token amount precision, and asset identifier normalisation.

Timestamp alignment: different sources use different formats. The normaliser converts everything to Unix milliseconds UTC. Use the source event_at for causal ordering; use the agent-stamped ingested_at for latency calculations.

Token amount precision: EVM token amounts are raw integers (e.g., 1000000 for 1 USDC at 6 decimals). Resolve decimals from a cached on-chain call and convert to decimal strings. Store the raw value alongside to prevent rounding errors.

Asset identifier normalisation: the same asset may appear as WETH, ETH, 0xC02aaA...4082, or ethereum. Map all representations to canonical IDs using a maintained asset registry.

// Example: normalised record output
{
  "record_id":     "evt_arb_uniswap_0x7a3b...91c2_14829401",
  "source_id":     "uniswap_v3_swap",
  "source_type":   "onchain_event",
  "chain":         "arbitrum",
  "event_at":      1719432811000,
  "ingested_at":   1719432811340,
  "asset_in":  { "id": "WETH", "amount": "2.5", "raw": "2500000000000000000" },
  "asset_out": { "id": "USDC", "amount": "8312.44", "raw": "8312440000" },
  "price_usd":     "3324.976",
  "confidence":    0.97,
  "data_freshness":"live"
}

Guide 3 — Running the scoring engine

The scoring engine consumes normalised records, groups them into opportunity candidates via a correlation engine, and applies OSF-1. An opportunity candidate is a group of records that together describe one actionable situation — e.g., a CEX price record + a DEX pool reserve read + a Chainlink oracle price forming a CEX-DEX arbitrage signal.

The correlation window is configurable (default: 5 seconds). Tighter windows reduce false positives but may miss slow-moving opportunities. In pseudocode:

// OSF-1 pseudocode
score = (yield_estimate * urgency_weight * confidence_factor)
        / (risk_score + liquidity_penalty)

yield_estimate    = expected_return_bps / 10000
urgency_weight    = 1 + (1 / max(1, opportunity_age_seconds))
confidence_factor = avg(source_confidences) * data_freshness_factor
risk_score        = protocol_risk + oracle_risk + smart_contract_risk
liquidity_penalty = 1 + max(0, (position_size / pool_liquidity) - 0.01) * 5

Guide 4 — Interpreting the decision layer output

Each strategy candidate contains: the matched opportunity, the recommended strategy code, the expected entry parameters, and a confidence tier (A/B/C/D).

TierScore rangeMeaningSuggested action
A0.80 – 1.00High confidence, strong signal, low riskExecute or escalate for human review
B0.60 – 0.79Good signal, moderate risk or data uncertaintyFlag for human analyst review
C0.40 – 0.59Weak signal or elevated riskMonitor only; do not act
D0.00 – 0.39Below threshold; included for transparencyDiscard

Guide 5 — Reading and routing outputs

The pipeline produces a primary JSON report each scoring cycle (default: every 60 seconds, or on-demand for critical events). Configure the output router in config/outputs.yaml:

# config/outputs.yaml
outputs:
  - id: "json_file"
    type: "file"
    path: "./reports/latest.json"
    overwrite: true

  - id: "webhook_alert"
    type: "webhook"
    url: "${ALERT_WEBHOOK_URL}"
    filter: "tier == 'A'"
    payload_template: "templates/alert_slim.json"

  - id: "dashboard_ws"
    type: "websocket_server"
    port: 8765
    emit_on: "every_cycle"

Case Study: OpenClawD & AiXT Trading Intelligence

A real-world-style walkthrough of how modern AI-augmented research agents feed their pipelines

OpenClawD and AiXT Trading Intelligence represent two distinct but complementary approaches to AI-augmented DeFi research. This case study traces the data path from raw chain state to scored output in both systems.

OpenClawD — event-driven ingestion

OpenClawD's architecture is designed around reactive ingestion: subscribes to a curated set of high-signal on-chain events and constructs opportunity candidates reactively when correlated events arrive.

┌─ OpenClawD Ingestion Layer ──────────────────────────────────────────┐ │ │ │ WebSocket Subscriptions ──► Event Router ──► Correlation Engine │ │ (Alchemy / Infura) │ │ │ │ │ routes by │ groups by │ │ │ event type │ asset+time │ │ Off-chain Feeds ────────────────┘ ▼ │ │ (Binance WS, Pyth, DeFiLlama) Candidate Assembler │ │ │ │ │ OSF-1 Scoring Engine │ └───────────────────────────────────────────────────────────────────────┘
SourceTypeData extractedRefresh
Alchemy WebSocketOn-chain eventsSwap, Borrow, LiquidationCall, Transfer for monitored protocolsPer-block (12s / 250ms)
Pyth NetworkOracleReal-time price updates for 80+ assets400ms
Binance WebSocketOff-chain CEXBest bid/ask, 24h volume, funding rates for perp markets100ms
Gauntlet Risk APIProtocol riskCollateral risk scores, utilisation rates, risk parameter changes5 min
Snapshot APIGovernanceActive proposals, vote counts, time-to-close15 min

When a Swap event arrives from a Uniswap v3 pool, OpenClawD's normaliser: decodes the ABI-encoded log → resolves token0/token1 to canonical IDs → converts raw integers to decimal strings → computes implied price from sqrtPriceX96 → cross-references against the most recent Pyth price → stamps timestamps → pushes to the correlation queue. If the price deviation exceeds 0.5%, the record is tagged price_divergence: true, fast-tracking CEX-DEX arbitrage candidate assembly.

AiXT Trading Intelligence — poll-and-model ingestion

AiXT runs a continuous polling loop against a wider set of sources, feeds all data into a feature store, and applies a gradient-boosted tree model to score the feature vector every 30 seconds. Better suited for longer-horizon signals — macro yield trends, governance risk, protocol health — where event-driven ingestion would produce too much noise.

AiXT's feature vector per (protocol, asset_pair, chain) triplet includes: current APR and 7-day rolling average, utilisation rate and its 1h rate of change, TVL and its 24h rate of change, governance risk score, CEX funding rate, and a 4h rolling sentiment score from LunarCrush. The model is retrained weekly on new outcome data and outputs an OSF-1 compatible score, feeding the same decision layer as OpenClawD.

Key takeaway Event-driven (OpenClawD) and poll-and-model (AiXT) are complementary, not competing. A production stack runs both: event-driven handles sub-minute opportunities; model-based handles multi-day signals. Both feed the same decision layer and produce the same output format.
Reference

Technical Specifications

Complete API endpoints, JSON schemas, scoring formulas, and configuration options. Use this section as a lookup, not a reading guide.

Opportunity Scoring Formula (OSF-1)

OSF-1 produces a dimensionless score in the range [0, ∞). In practice well-formed opportunities score between 0.0 and 2.0; scores above 1.0 are exceptional.

/**
 * OSF-1 — Opportunity Scoring Formula, version 1
 *
 * score = (Y × U × C) / (R + L)
 *
 * Y  — Yield estimate      (dimensionless, basis points / 10000)
 * U  — Urgency weight      (dimensionless, ≥ 1.0)
 * C  — Confidence factor   (0.0 – 1.0)
 * R  — Risk score          (dimensionless, > 0)
 * L  — Liquidity penalty   (dimensionless, ≥ 1.0)
 */

Y = expected_return_bps / 10000
// For yield: APR difference between protocols
// For arb:   price spread minus estimated gas cost in bps
// For liq:   bonus percentage in bps

U = 1 + (1 / max(1, opportunity_age_seconds))
// 0s old: U=2.0 | 60s old: U≈1.016 | 300s old: U≈1.003

C = mean(source_confidences) × F
// F — data freshness factor:
//   1.0  if all sources within TTL
//   0.7  if any source is 1–2× TTL
//   0.4  if any source is 2–3× TTL
//   0.0  if any source exceeds 3× TTL

R = w_sc×SC + w_or×OR + w_lq×LQ
// SC: smart contract risk  OR: oracle risk  LQ: liquidity risk
// Default weights: w_sc=0.5, w_or=0.3, w_lq=0.2

L = 1 + max(0, (position_size_usd / pool_liquidity_usd - 0.01)) × 5
// Neutral (1.0) at ≤1% of pool liquidity. Rises linearly above.
ScenarioYUCRLScoreTier
CEX-DEX arb, 25bps, 0s old, live feeds0.00252.00.950.251.010.019
Yield arb, 180bps, 30s old, live feeds0.0181.0320.920.401.00.043
Liquidation, 5% bonus, 0s old, low-risk protocol0.052.00.970.151.00.647B
Liquidation, 8% bonus, 0s old, blue-chip0.082.00.980.101.01.568A
Note on score magnitude OSF-1 scores are only meaningful relative to each other within a single scoring cycle. Always use the tier label (A/B/C/D) for cross-cycle comparisons.

NormalisedRecord schema

Every record produced by the normalisation stage must conform to this schema. Absent required fields cause the record to be dropped.

{
  "record_id":      "string",   // required — unique, deterministic ID
  "source_id":      "string",   // required — subscription or feed id
  "source_type":    "string",   // required — onchain_event | onchain_state | oracle | cex_feed | rest_api
  "chain":          "string",   // required for onchain_* types
  "protocol":       "string",   // optional — e.g. "aave_v3"
  "event_at":       number,     // required — Unix ms UTC (source timestamp)
  "ingested_at":    number,     // required — Unix ms UTC (agent receipt time)
  "block_number":   number,     // optional — for onchain_* types
  "tx_hash":        "string",   // optional — 0x-prefixed hex
  "assets": [{                  // required — at least one
    "id":        "string",      // canonical asset id
    "role":      "string",      // in | out | collateral | debt | supply
    "amount":    "string",      // decimal string
    "amount_raw":"string",      // original integer string
    "amount_usd":"string"       // USD value at event_at, if resolvable
  }],
  "price_usd":      "string",   // optional — implied price if single-pair event
  "metadata":       "object",   // optional — source-specific extra fields
  "confidence":     number,     // required — 0.0–1.0
  "data_freshness": "string",   // required — live | fresh | stale | expired
  "tags":           "string[]"  // optional — e.g. ["price_divergence"]
}

Output JSON schema

The primary output of each scoring cycle, produced as latest.json and optionally emitted over WebSocket.

{
  "report_id":         "string",    // UUID v4
  "generated_at":      number,      // Unix ms UTC
  "cycle_duration_ms": number,
  "status":            "string",    // ok | degraded | error
  "degraded_feeds":    "string[]",
  "opportunities": [{
    "opp_id":               "string",
    "type":                 "string",  // arb.cex_dex | yield | liquidation | governance_risk
    "score":                number,
    "tier":                 "string",  // A | B | C | D
    "strategy":             "string",
    "assets":               "string[]",
    "protocols":            "string[]",
    "chains":               "string[]",
    "estimated_return_bps": number,
    "estimated_return_usd": "string",
    "risk_score":           number,
    "confidence":          number,
    "expires_at":           number,
    "sources": [{ "record_id":"string", "source_id":"string", "ingested_at":number, "confidence":number }]
  }],
  "summary": {
    "total_candidates": number, "tier_a_count": number, "tier_b_count": number,
    "highest_score": number, "feeds_healthy": number, "feeds_degraded": number
  }
}

Data source API endpoints

SourceEndpointAuthRate limitNotes
Alchemy (ETH)wss://eth-mainnet.g.alchemy.com/v2/{key}API keyCompute units / monthPrefer WebSocket for event subs; HTTPS for archive reads
Pyth Networkhttps://hermes.pyth.network/v2/updates/price/latestNoneNone (best-effort)Poll ?ids[]= with price feed IDs. Max 100 ids per request.
Binance WebSocketwss://stream.binance.com:9443/ws/{stream}None (public)300 connections / 5 min per IPUse combined streams: /stream?streams=ethusdt@bookTicker/btcusdt@bookTicker
DeFiLlamahttps://api.llama.fi/v2/protocolsNone~300 req/minFor pool-level yield: https://yields.llama.fi/pools
The Graphhttps://api.thegraph.com/subgraphs/name/{subgraph}API key1000 req/day (free)GraphQL POST. Use for historical queries; too slow for real-time.
Chainlink Data Feedseth_calllatestRoundData()None (on-chain)RPC rate limitReturns (roundId, answer, startedAt, updatedAt, answeredInRound). Check updatedAt vs heartbeat.
Gauntlet Risk APIhttps://risk.gauntlet.network/api/v1/protocols/{id}API keyPer contractReturns risk scores, utilisation, recommended parameters. B2B access required.
Snapshothttps://hub.snapshot.org/graphqlNoneUnlisted, generousGraphQL. Query proposals with state: "active" for live governance risk.

Asset registry (common entries)

Canonical IDSymbolEthereum addressDecimals
WETHWETH0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc218
USDCUSDC0xA0b86991c6218b36c1d19D4a2e9Eb0cE3606eB486
USDTUSDT0xdAC17F958D2ee523a2206206994597C13D831ec76
WBTCWBTC0x2260FAC5E5542a773Aa44fBCfeDf7C193bc2C5998
stETHstETH0xae7ab96520DE3A18E5e111B5EaAb095312D7fE8418
DAIDAI0x6B175474E89094C44Da98b954EedeAC495271d0F18

Strategy codes

CodeNameOpportunity typeTypical Y range (bps)
arb.cex_dexCEX-DEX arbitragePrice divergence between CEX spot and DEX pool5 – 80
arb.dex_dexDEX-DEX arbitragePrice divergence between two DEX pools2 – 40
arb.cross_chainCross-chain arbitragePrice divergence across chains via bridge10 – 200
yield.stable_lpStablecoin LP yieldSuperior LP APR vs lending rate for same stablecoin20 – 500
yield.lendingLending rate arbSupply/borrow rate spread across protocols10 – 300
liquidation.bonusLiquidationUnder-collateralised position with bonus > gas cost300 – 1500
governance.risk_flagGovernance risk signalActive proposal changing protocol risk parametersN/A (informational)

Configuration reference

All configuration files live in config/. Hot reload is supported for feeds.yaml, scoring.yaml, and outputs.yaml; restart required for providers.yaml and subscriptions.yaml.

FileHot reloadKey fields
providers.yamlNochains[].primary, .fallback, .block_time_ms
subscriptions.yamlNosubscriptions[].id, .chain, .address, .event, .handler, .priority
feeds.yamlYesfeeds[].id, .type, .url, .poll_interval_seconds, .ttl_seconds, .normaliser
scoring.yamlYesosf1.weights.*, thresholds.min_score, thresholds.min_liquidity_usd, correlation_window_seconds
outputs.yamlYesoutputs[].id, .type, .filter, .payload_template
Troubleshooting

Diagnosing Pipeline Issues

Common failure modes, their symptoms, and how to fix them. Organised by pipeline stage.

Before diving in Most pipeline issues produce a status: "degraded" or status: "error" field in the output JSON, with a degraded_feeds array listing affected sources. Start there — it identifies which stage to investigate first.

Ingestion failures

Ingestion WebSocket connection drops silently

Symptom: The agent is running, the feed shows as live, but no new events have arrived for several minutes. Opportunity scores based on this feed begin degrading.

Cause

WebSocket providers (including Alchemy and Infura) silently drop connections after periods of low activity or when the server rotates. The client does not receive a close frame, so it does not attempt to reconnect.

Fix

Implement a heartbeat check: if no message has been received within 2 × block_time_ms, close and reconnect. For Ethereum mainnet: ~24 seconds. For Arbitrum: 500ms.

// Heartbeat pattern (pseudocode)
lastMessageAt = Date.now()
ws.on('message', () => { lastMessageAt = Date.now() })
setInterval(() => {
  if (Date.now() - lastMessageAt > 2 * BLOCK_TIME_MS) {
    log.warn('Heartbeat timeout — reconnecting')
    ws.close(); reconnect()
  }
}, BLOCK_TIME_MS)
Ingestion RPC provider rate limit (429 errors)

Symptom: Logs show repeated 429 Too Many Requests. State reads are failing or returning stale data. On-chain confidence values drop sharply.

Cause

Archive node reads are high-compute-unit operations. A burst of event-triggered state reads can exhaust the hourly compute budget quickly.

Fix
  1. Switch to the fallback provider immediately on a 429 — don't wait for a retry interval.
  2. Cache state reads with a 10-minute TTL; only re-fetch on parameter-change events.
  3. Use Multicall3 (0xcA11bde05977b3631167028862bE2a173976CA11) to batch multiple eth_calls into a single RPC call.
Ingestion Off-chain feed returns stale data without error

Symptom: A REST feed returns a 200 response, but the updatedAt in the payload is several hours old. The normaliser marks the record as fresh because the HTTP response was successful.

Fix

Every normaliser must validate the payload's own timestamp, not just HTTP status. If the payload's updatedAt exceeds the feed's ttl_seconds, mark the record as stale regardless of HTTP status.

Normalisation errors

Normalisation Token decimals resolution failure

Symptom: Records for a specific token arrive with amount: null or wildly incorrect USD values. Normaliser log shows WARN: decimals cache miss for 0x....

Fix
  1. Add a circuit breaker: if resolution fails twice within 5 minutes, add the address to a pending_resolution list and skip records for it rather than passing through with a null amount.
  2. Pre-populate the decimals cache from the asset registry at startup — don't rely solely on on-chain resolution.
  3. For rebasing tokens (stETH, aTokens), decimals() returns 18 but the rebase factor must be fetched separately. Use the protocol-specific normaliser, not the generic ERC-20 one.
Normalisation Duplicate records in the queue

Symptom: The same on-chain event appears twice, causing duplicate opportunity candidates with identical scores.

Fix

Generate a deterministic record_id using sha256(tx_hash + log_index). Before inserting into the queue, check against a deduplication bloom filter with a 10-minute TTL. Reject records whose record_id is already in the filter.

Scoring issues

Scoring All opportunities scoring near zero

Symptom: Scoring engine is running, candidates are being assembled, but all OSF-1 scores are below 0.01. No tier-A or tier-B candidates. Market conditions appear normal.

Cause (most common)

The confidence factor C is collapsing to near-zero because one or more critical feeds have exceeded their TTL. Check the output JSON's degraded_feeds array. Fix the feed, not the formula.

Cause (secondary)

OSF-1 weights changed in scoring.yaml without re-calibrating the minimum score thresholds. After any weight change, run the scoring engine against a historical snapshot and compare score distributions before and after.

Scoring Scoring engine timeout on large candidate sets

Symptom: During high-volatility periods, the scoring cycle exceeds its configured timeout. Output is delayed or emitted with status: "error".

Fix

Implement a candidate cap: if more than N=500 candidates are assembled in a single correlation window, apply a pre-scoring filter: (1) discard candidates with any expired source; (2) discard candidates below a minimum raw yield estimate; (3) sample uniformly from remaining up to N.

Output problems

Output Webhook alerts not firing for tier-A opportunities

Symptom: The JSON report shows tier-A opportunities, but no webhook alert is delivered. No error in the output router log.

Fix

Test filter expressions using the pipeline's built-in evaluator:

npx agent-pipeline test-filter \
  --filter "tier == 'A'" \
  --sample-report "./reports/latest.json"
// Should output: "Filter matched 3 of 12 opportunities"

If the evaluator returns 0 matches on a report with tier-A entries, the filter syntax is wrong. Refer to the filter expression language reference in config/outputs.yaml comments.

Output Consumer reports schema validation failure

Symptom: A downstream system rejects the output JSON with an error like required field 'estimated_return_usd' missing.

Fix

The pipeline emits a schema_version field at the top level of every output report. Downstream consumers should check this field and reject with a clear version mismatch error rather than a field-level validation error. Coordinate schema upgrades using the changelog at docs/schema-changelog.md.

First-response checklist

#CheckWhere to look
1Is status in the latest output JSON ok, degraded, or error?./reports/latest.json → top-level status
2Are any feeds listed in degraded_feeds?./reports/latest.jsondegraded_feeds array
3Is the scoring cycle completing within its timeout budget?Logs → cycle_duration_ms; alert if > 80% of configured timeout
4Is the normalised record queue growing (backpressure)?Metrics dashboard → queue_depth gauge
5Is the primary RPC provider responding?Run: curl -X POST {rpc_url} -d '{"method":"eth_blockNumber","params":[],"id":1}'
6Did any config file change in the last deploy?git log -5 -- config/
7Is the system clock synchronised?Run: timedatectl status or ntpstat