Developer Documentation · v1.0 · 2026
The DeFi Research Agent Pipeline
A structured framework for understanding how autonomous research agents ingest on-chain data, score opportunities, and produce verifiable outputs — without writing the agent yourself.
Understanding the Pipeline
Mental models, data taxonomy, trust problem, output model.
GuidesSimulating the Pipeline
Step-by-step setup, normalisation, scoring, case study.
ReferenceTechnical Specifications
OSF-1 formula, JSON schemas, API endpoints, strategy codes.
TroubleshootingDiagnosing Issues
Feed failures, scoring bugs, output errors, first-response checklist.
Understanding the DeFi Research Agent Pipeline
The mental models you need before you touch any configuration or code. Start here.
What is a DeFi research agent?
A DeFi research agent is a software system — often AI-augmented — that autonomously monitors decentralised finance protocols, ingests structured data from multiple sources, applies scoring or ranking logic to candidate opportunities, and produces outputs that inform or execute financial decisions.
In 2026, research agents range from simple yield-monitoring bots that compare APRs across lending markets, to multi-modal systems like OpenClawD and AiXT Trading Intelligence that fuse on-chain event streams, sentiment signals, and cross-chain liquidity metrics to surface opportunities within milliseconds of a state change.
The word "research" is deliberate. Unlike execution agents (which transact), a research agent's primary output is information: structured reports, scored opportunity sets, and ranked strategy candidates. A human or an execution layer makes the final call.
Why pipeline architecture matters
DeFi data is fast, fragmented, and adversarial. A yield opportunity on Aave might last 40 seconds before arbitrageurs close it. A governance proposal on Compound can shift protocol risk parameters before your scoring model sees it. Without a structured pipeline, agents fail in predictable ways:
- Stale data decisions — scoring an opportunity based on a price that was valid 12 blocks ago.
- Source conflation — treating on-chain oracle prices and CEX spot prices as equivalent without normalisation.
- Scoring drift — ranking logic that uses different risk parameters than the ones the output consumer expects.
- Silent failures — a feed goes down; the agent keeps scoring on cached data without signalling degraded confidence.
A pipeline architecture solves this by separating concerns into discrete, observable stages: ingestion → normalisation → scoring → decision → output. Each stage has a defined contract with the next. Failures are localised. Confidence degrades explicitly rather than silently.
The five-stage model
| Stage | Input | Output | Failure mode |
|---|---|---|---|
| 1. Ingestion | Raw RPC calls, WebSocket events, REST feeds | Unvalidated raw records | Connection loss, rate limiting, provider outage |
| 2. Normalisation | Raw records | Schema-validated, timestamped, deduplicated records | Schema mismatch, clock skew, duplicate events |
| 3. Scoring | Normalised records | Scored, ranked opportunity set | Missing fields, division by zero in formula, stale input |
| 4. Decision | Scored opportunities | Selected strategy candidates with confidence labels | All candidates below threshold, conflicting signals |
| 5. Output | Strategy candidates | JSON report, alert, dashboard update, signed attestation | Serialisation error, delivery failure, consumer schema mismatch |
Data source taxonomy
DeFi research agents draw from four categories of data. Understanding the latency, reliability, and trust characteristics of each is prerequisite knowledge for designing the ingestion layer.
On-chain data
Data read directly from a blockchain's state or event logs via an RPC endpoint or indexer. The ground-truth layer — cryptographically final after sufficient confirmations.
- State reads —
eth_callto read protocol parameters, liquidity pool reserves, vault balances, governance states. - Event logs —
eth_getLogsforSwap,Deposit,Borrow,LiquidationCallevents. - Block metadata — block number, timestamp, base fee, proposer (for MEV context).
- Mempool data — pending transactions via Blocknative or Flashbots.
Latency: 1–12s (block time). Trust: cryptographic. Cost: scales with RPC call volume.
Off-chain data
Data originating outside the blockchain — faster but requires explicit trust decisions.
- CEX price feeds — Binance, Coinbase, Kraken REST/WebSocket APIs.
- Macro/news feeds — Bloomberg, Refinitiv, The Block, DeFiLlama news.
- Social sentiment — Twitter/X, Reddit, Telegram via LunarCrush or Santiment.
- Protocol risk scores — DeFi Safety, Gauntlet, Chaos Labs risk parameter APIs.
Oracle data
Decentralised price feeds that sit on-chain but aggregate off-chain price data. On-chain in location, off-chain in origin.
- Chainlink Data Feeds — aggregated spot prices, updated on deviation threshold or heartbeat.
- Pyth Network — pull-based oracle with sub-second updates, popular on Solana and EVM L2s.
- Uniswap v3 TWAP — time-weighted average price from on-chain swap history; manipulation-resistant but slow.
- Redstone — modular oracle supporting custom data types beyond spot price.
Protocol events and governance feeds
- Governance proposals — Snapshot, Tally, Governor Bravo APIs for pending votes.
- Parameter changes — collateral factor updates, interest rate model changes, fee tier adjustments.
- Liquidation thresholds — near-liquidation position tracking via The Graph or protocol-specific APIs.
- Protocol upgrades and pauses — proxy upgrade events, circuit breaker triggers.
Decision logic model
The decision logic layer has three components: a scoring engine, a ranking layer, and a strategy selector.
The scoring engine applies OSF-1 to produce a numeric score per candidate. The ranking layer sorts by score and applies filters: minimum score threshold, minimum liquidity floor, maximum protocol concentration. The strategy selector matches top-ranked opportunities against a strategy library (e.g., arb.cex-dex, yield.stablecoin-lp, liquidation.bonus).
The trust problem in autonomous agents
Without a trust framework, agents are black boxes. The pipeline addresses this with three mechanisms:
- Confidence labels — every output includes a
confidencefield derived from data completeness and source reliability. - Data lineage — every output includes a
sourcesarray listing which feeds contributed, with freshness timestamps. - Degradation signalling — if any required feed is stale beyond a threshold, output
statusis set todegradedrather than silently omitting data.
Output model
The pipeline supports four output formats, producible simultaneously:
- JSON report — the canonical output. All scored opportunities, strategy candidates, data lineage, and confidence metadata.
- Alert — a lightweight notification (webhook, Telegram, Slack) when a scored opportunity exceeds a threshold.
- Dashboard — a real-time web view backed by a WebSocket connection to the agent's scoring output stream.
- Signed attestation — a cryptographically signed summary for downstream on-chain verification or audit trails.
Simulating the Research Agent Pipeline
Step-by-step instructions for configuring and running each stage of a DeFi research agent pipeline, plus a full case study.
Guide 1 — Setting up the ingestion layer
The ingestion layer pulls raw data from every configured source and places it into the normalisation queue. It runs continuously and must handle partial failures gracefully.
Configure your RPC providers
Every ingestion layer needs at least two RPC endpoints per chain — a primary and a fallback. Single-provider setups fail silently when the provider rate-limits or goes offline.
# config/providers.yaml chains: ethereum: primary: "wss://eth-mainnet.g.alchemy.com/v2/${ALCHEMY_KEY}" fallback: "https://rpc.ankr.com/eth/${ANKR_KEY}" archive: "https://eth-mainnet.g.alchemy.com/v2/${ALCHEMY_KEY}" block_time_ms: 12000 arbitrum: primary: "wss://arb-mainnet.g.alchemy.com/v2/${ALCHEMY_KEY}" fallback: "https://rpc.ankr.com/arbitrum/${ANKR_KEY}" block_time_ms: 250 base: primary: "wss://base-mainnet.g.alchemy.com/v2/${ALCHEMY_KEY}" fallback: "https://mainnet.base.org" block_time_ms: 2000
Define protocol event subscriptions
Use a declarative subscription file to specify which protocol events the ingestion layer should monitor. Each subscription maps a contract address and event signature to a data handler.
# config/subscriptions.yaml subscriptions: - id: "aave_v3_borrow" chain: "ethereum" address: "0x87870Bca3F3fD6335C3F4ce8392D69350B4fA4E2" event: "Borrow(address,address,address,uint256,uint8,uint256,uint16)" handler: "handlers.lending.on_borrow" priority: high - id: "uniswap_v3_swap" chain: "arbitrum" address: "*" # wildcard: all UniV3 pools event: "Swap(address,address,int256,int256,uint160,uint128,int24)" handler: "handlers.amm.on_swap" priority: medium - id: "compound_v3_liquidation" chain: "ethereum" address: "0xc3d688B66703497DAA19211EEdff47f25384cdc3" event: "AbsorbCollateral(address,address,address,uint256,uint256)" handler: "handlers.lending.on_liquidation" priority: critical
Configure off-chain feeds
Off-chain feeds are polled on a schedule rather than subscribed to. Set a ttl_seconds for each — the normalisation layer uses this to mark data as stale and downgrade confidence scores.
# config/feeds.yaml feeds: - id: "binance_spot" type: "websocket" url: "wss://stream.binance.com:9443/ws/!ticker@arr" ttl_seconds: 30 normaliser: "normalisers.cex.binance" - id: "defillama_tvl" type: "rest_poll" url: "https://api.llama.fi/v2/protocols" poll_interval_seconds: 120 ttl_seconds: 300 normaliser: "normalisers.analytics.defillama" - id: "chainlink_eth_usd" type: "onchain_oracle" chain: "ethereum" address: "0x5f4eC3Df9cbd43714FE2740f5E3616155c5b8419" heartbeat_seconds: 3600 ttl_seconds: 4200 normaliser: "normalisers.oracle.chainlink"
Guide 2 — Normalising on-chain and off-chain data
Normalisation converts heterogeneous raw records into a common schema. The three most critical operations are timestamp alignment, token amount precision, and asset identifier normalisation.
Timestamp alignment: different sources use different formats. The normaliser converts everything to Unix milliseconds UTC. Use the source event_at for causal ordering; use the agent-stamped ingested_at for latency calculations.
Token amount precision: EVM token amounts are raw integers (e.g., 1000000 for 1 USDC at 6 decimals). Resolve decimals from a cached on-chain call and convert to decimal strings. Store the raw value alongside to prevent rounding errors.
Asset identifier normalisation: the same asset may appear as WETH, ETH, 0xC02aaA...4082, or ethereum. Map all representations to canonical IDs using a maintained asset registry.
// Example: normalised record output { "record_id": "evt_arb_uniswap_0x7a3b...91c2_14829401", "source_id": "uniswap_v3_swap", "source_type": "onchain_event", "chain": "arbitrum", "event_at": 1719432811000, "ingested_at": 1719432811340, "asset_in": { "id": "WETH", "amount": "2.5", "raw": "2500000000000000000" }, "asset_out": { "id": "USDC", "amount": "8312.44", "raw": "8312440000" }, "price_usd": "3324.976", "confidence": 0.97, "data_freshness":"live" }
Guide 3 — Running the scoring engine
The scoring engine consumes normalised records, groups them into opportunity candidates via a correlation engine, and applies OSF-1. An opportunity candidate is a group of records that together describe one actionable situation — e.g., a CEX price record + a DEX pool reserve read + a Chainlink oracle price forming a CEX-DEX arbitrage signal.
The correlation window is configurable (default: 5 seconds). Tighter windows reduce false positives but may miss slow-moving opportunities. In pseudocode:
// OSF-1 pseudocode
score = (yield_estimate * urgency_weight * confidence_factor)
/ (risk_score + liquidity_penalty)
yield_estimate = expected_return_bps / 10000
urgency_weight = 1 + (1 / max(1, opportunity_age_seconds))
confidence_factor = avg(source_confidences) * data_freshness_factor
risk_score = protocol_risk + oracle_risk + smart_contract_risk
liquidity_penalty = 1 + max(0, (position_size / pool_liquidity) - 0.01) * 5
Guide 4 — Interpreting the decision layer output
Each strategy candidate contains: the matched opportunity, the recommended strategy code, the expected entry parameters, and a confidence tier (A/B/C/D).
| Tier | Score range | Meaning | Suggested action |
|---|---|---|---|
| A | 0.80 – 1.00 | High confidence, strong signal, low risk | Execute or escalate for human review |
| B | 0.60 – 0.79 | Good signal, moderate risk or data uncertainty | Flag for human analyst review |
| C | 0.40 – 0.59 | Weak signal or elevated risk | Monitor only; do not act |
| D | 0.00 – 0.39 | Below threshold; included for transparency | Discard |
Guide 5 — Reading and routing outputs
The pipeline produces a primary JSON report each scoring cycle (default: every 60 seconds, or on-demand for critical events). Configure the output router in config/outputs.yaml:
# config/outputs.yaml outputs: - id: "json_file" type: "file" path: "./reports/latest.json" overwrite: true - id: "webhook_alert" type: "webhook" url: "${ALERT_WEBHOOK_URL}" filter: "tier == 'A'" payload_template: "templates/alert_slim.json" - id: "dashboard_ws" type: "websocket_server" port: 8765 emit_on: "every_cycle"
Case Study: OpenClawD & AiXT Trading Intelligence
A real-world-style walkthrough of how modern AI-augmented research agents feed their pipelines
OpenClawD and AiXT Trading Intelligence represent two distinct but complementary approaches to AI-augmented DeFi research. This case study traces the data path from raw chain state to scored output in both systems.
OpenClawD — event-driven ingestion
OpenClawD's architecture is designed around reactive ingestion: subscribes to a curated set of high-signal on-chain events and constructs opportunity candidates reactively when correlated events arrive.
| Source | Type | Data extracted | Refresh |
|---|---|---|---|
| Alchemy WebSocket | On-chain events | Swap, Borrow, LiquidationCall, Transfer for monitored protocols | Per-block (12s / 250ms) |
| Pyth Network | Oracle | Real-time price updates for 80+ assets | 400ms |
| Binance WebSocket | Off-chain CEX | Best bid/ask, 24h volume, funding rates for perp markets | 100ms |
| Gauntlet Risk API | Protocol risk | Collateral risk scores, utilisation rates, risk parameter changes | 5 min |
| Snapshot API | Governance | Active proposals, vote counts, time-to-close | 15 min |
When a Swap event arrives from a Uniswap v3 pool, OpenClawD's normaliser: decodes the ABI-encoded log → resolves token0/token1 to canonical IDs → converts raw integers to decimal strings → computes implied price from sqrtPriceX96 → cross-references against the most recent Pyth price → stamps timestamps → pushes to the correlation queue. If the price deviation exceeds 0.5%, the record is tagged price_divergence: true, fast-tracking CEX-DEX arbitrage candidate assembly.
AiXT Trading Intelligence — poll-and-model ingestion
AiXT runs a continuous polling loop against a wider set of sources, feeds all data into a feature store, and applies a gradient-boosted tree model to score the feature vector every 30 seconds. Better suited for longer-horizon signals — macro yield trends, governance risk, protocol health — where event-driven ingestion would produce too much noise.
AiXT's feature vector per (protocol, asset_pair, chain) triplet includes: current APR and 7-day rolling average, utilisation rate and its 1h rate of change, TVL and its 24h rate of change, governance risk score, CEX funding rate, and a 4h rolling sentiment score from LunarCrush. The model is retrained weekly on new outcome data and outputs an OSF-1 compatible score, feeding the same decision layer as OpenClawD.
Technical Specifications
Complete API endpoints, JSON schemas, scoring formulas, and configuration options. Use this section as a lookup, not a reading guide.
Opportunity Scoring Formula (OSF-1)
OSF-1 produces a dimensionless score in the range [0, ∞). In practice well-formed opportunities score between 0.0 and 2.0; scores above 1.0 are exceptional.
/** * OSF-1 — Opportunity Scoring Formula, version 1 * * score = (Y × U × C) / (R + L) * * Y — Yield estimate (dimensionless, basis points / 10000) * U — Urgency weight (dimensionless, ≥ 1.0) * C — Confidence factor (0.0 – 1.0) * R — Risk score (dimensionless, > 0) * L — Liquidity penalty (dimensionless, ≥ 1.0) */ Y = expected_return_bps / 10000 // For yield: APR difference between protocols // For arb: price spread minus estimated gas cost in bps // For liq: bonus percentage in bps U = 1 + (1 / max(1, opportunity_age_seconds)) // 0s old: U=2.0 | 60s old: U≈1.016 | 300s old: U≈1.003 C = mean(source_confidences) × F // F — data freshness factor: // 1.0 if all sources within TTL // 0.7 if any source is 1–2× TTL // 0.4 if any source is 2–3× TTL // 0.0 if any source exceeds 3× TTL R = w_sc×SC + w_or×OR + w_lq×LQ // SC: smart contract risk OR: oracle risk LQ: liquidity risk // Default weights: w_sc=0.5, w_or=0.3, w_lq=0.2 L = 1 + max(0, (position_size_usd / pool_liquidity_usd - 0.01)) × 5 // Neutral (1.0) at ≤1% of pool liquidity. Rises linearly above.
| Scenario | Y | U | C | R | L | Score | Tier |
|---|---|---|---|---|---|---|---|
| CEX-DEX arb, 25bps, 0s old, live feeds | 0.0025 | 2.0 | 0.95 | 0.25 | 1.01 | 0.019 | — |
| Yield arb, 180bps, 30s old, live feeds | 0.018 | 1.032 | 0.92 | 0.40 | 1.0 | 0.043 | — |
| Liquidation, 5% bonus, 0s old, low-risk protocol | 0.05 | 2.0 | 0.97 | 0.15 | 1.0 | 0.647 | B |
| Liquidation, 8% bonus, 0s old, blue-chip | 0.08 | 2.0 | 0.98 | 0.10 | 1.0 | 1.568 | A |
NormalisedRecord schema
Every record produced by the normalisation stage must conform to this schema. Absent required fields cause the record to be dropped.
{
"record_id": "string", // required — unique, deterministic ID
"source_id": "string", // required — subscription or feed id
"source_type": "string", // required — onchain_event | onchain_state | oracle | cex_feed | rest_api
"chain": "string", // required for onchain_* types
"protocol": "string", // optional — e.g. "aave_v3"
"event_at": number, // required — Unix ms UTC (source timestamp)
"ingested_at": number, // required — Unix ms UTC (agent receipt time)
"block_number": number, // optional — for onchain_* types
"tx_hash": "string", // optional — 0x-prefixed hex
"assets": [{ // required — at least one
"id": "string", // canonical asset id
"role": "string", // in | out | collateral | debt | supply
"amount": "string", // decimal string
"amount_raw":"string", // original integer string
"amount_usd":"string" // USD value at event_at, if resolvable
}],
"price_usd": "string", // optional — implied price if single-pair event
"metadata": "object", // optional — source-specific extra fields
"confidence": number, // required — 0.0–1.0
"data_freshness": "string", // required — live | fresh | stale | expired
"tags": "string[]" // optional — e.g. ["price_divergence"]
}
Output JSON schema
The primary output of each scoring cycle, produced as latest.json and optionally emitted over WebSocket.
{
"report_id": "string", // UUID v4
"generated_at": number, // Unix ms UTC
"cycle_duration_ms": number,
"status": "string", // ok | degraded | error
"degraded_feeds": "string[]",
"opportunities": [{
"opp_id": "string",
"type": "string", // arb.cex_dex | yield | liquidation | governance_risk
"score": number,
"tier": "string", // A | B | C | D
"strategy": "string",
"assets": "string[]",
"protocols": "string[]",
"chains": "string[]",
"estimated_return_bps": number,
"estimated_return_usd": "string",
"risk_score": number,
"confidence": number,
"expires_at": number,
"sources": [{ "record_id":"string", "source_id":"string", "ingested_at":number, "confidence":number }]
}],
"summary": {
"total_candidates": number, "tier_a_count": number, "tier_b_count": number,
"highest_score": number, "feeds_healthy": number, "feeds_degraded": number
}
}
Data source API endpoints
| Source | Endpoint | Auth | Rate limit | Notes |
|---|---|---|---|---|
| Alchemy (ETH) | wss://eth-mainnet.g.alchemy.com/v2/{key} | API key | Compute units / month | Prefer WebSocket for event subs; HTTPS for archive reads |
| Pyth Network | https://hermes.pyth.network/v2/updates/price/latest | None | None (best-effort) | Poll ?ids[]= with price feed IDs. Max 100 ids per request. |
| Binance WebSocket | wss://stream.binance.com:9443/ws/{stream} | None (public) | 300 connections / 5 min per IP | Use combined streams: /stream?streams=ethusdt@bookTicker/btcusdt@bookTicker |
| DeFiLlama | https://api.llama.fi/v2/protocols | None | ~300 req/min | For pool-level yield: https://yields.llama.fi/pools |
| The Graph | https://api.thegraph.com/subgraphs/name/{subgraph} | API key | 1000 req/day (free) | GraphQL POST. Use for historical queries; too slow for real-time. |
| Chainlink Data Feeds | eth_call → latestRoundData() | None (on-chain) | RPC rate limit | Returns (roundId, answer, startedAt, updatedAt, answeredInRound). Check updatedAt vs heartbeat. |
| Gauntlet Risk API | https://risk.gauntlet.network/api/v1/protocols/{id} | API key | Per contract | Returns risk scores, utilisation, recommended parameters. B2B access required. |
| Snapshot | https://hub.snapshot.org/graphql | None | Unlisted, generous | GraphQL. Query proposals with state: "active" for live governance risk. |
Asset registry (common entries)
| Canonical ID | Symbol | Ethereum address | Decimals |
|---|---|---|---|
WETH | WETH | 0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2 | 18 |
USDC | USDC | 0xA0b86991c6218b36c1d19D4a2e9Eb0cE3606eB48 | 6 |
USDT | USDT | 0xdAC17F958D2ee523a2206206994597C13D831ec7 | 6 |
WBTC | WBTC | 0x2260FAC5E5542a773Aa44fBCfeDf7C193bc2C599 | 8 |
stETH | stETH | 0xae7ab96520DE3A18E5e111B5EaAb095312D7fE84 | 18 |
DAI | DAI | 0x6B175474E89094C44Da98b954EedeAC495271d0F | 18 |
Strategy codes
| Code | Name | Opportunity type | Typical Y range (bps) |
|---|---|---|---|
arb.cex_dex | CEX-DEX arbitrage | Price divergence between CEX spot and DEX pool | 5 – 80 |
arb.dex_dex | DEX-DEX arbitrage | Price divergence between two DEX pools | 2 – 40 |
arb.cross_chain | Cross-chain arbitrage | Price divergence across chains via bridge | 10 – 200 |
yield.stable_lp | Stablecoin LP yield | Superior LP APR vs lending rate for same stablecoin | 20 – 500 |
yield.lending | Lending rate arb | Supply/borrow rate spread across protocols | 10 – 300 |
liquidation.bonus | Liquidation | Under-collateralised position with bonus > gas cost | 300 – 1500 |
governance.risk_flag | Governance risk signal | Active proposal changing protocol risk parameters | N/A (informational) |
Configuration reference
All configuration files live in config/. Hot reload is supported for feeds.yaml, scoring.yaml, and outputs.yaml; restart required for providers.yaml and subscriptions.yaml.
| File | Hot reload | Key fields |
|---|---|---|
providers.yaml | No | chains[].primary, .fallback, .block_time_ms |
subscriptions.yaml | No | subscriptions[].id, .chain, .address, .event, .handler, .priority |
feeds.yaml | Yes | feeds[].id, .type, .url, .poll_interval_seconds, .ttl_seconds, .normaliser |
scoring.yaml | Yes | osf1.weights.*, thresholds.min_score, thresholds.min_liquidity_usd, correlation_window_seconds |
outputs.yaml | Yes | outputs[].id, .type, .filter, .payload_template |
Diagnosing Pipeline Issues
Common failure modes, their symptoms, and how to fix them. Organised by pipeline stage.
status: "degraded" or status: "error" field in the output JSON, with a degraded_feeds array listing affected sources. Start there — it identifies which stage to investigate first.
Ingestion failures
Symptom: The agent is running, the feed shows as live, but no new events have arrived for several minutes. Opportunity scores based on this feed begin degrading.
WebSocket providers (including Alchemy and Infura) silently drop connections after periods of low activity or when the server rotates. The client does not receive a close frame, so it does not attempt to reconnect.
FixImplement a heartbeat check: if no message has been received within 2 × block_time_ms, close and reconnect. For Ethereum mainnet: ~24 seconds. For Arbitrum: 500ms.
// Heartbeat pattern (pseudocode)
lastMessageAt = Date.now()
ws.on('message', () => { lastMessageAt = Date.now() })
setInterval(() => {
if (Date.now() - lastMessageAt > 2 * BLOCK_TIME_MS) {
log.warn('Heartbeat timeout — reconnecting')
ws.close(); reconnect()
}
}, BLOCK_TIME_MS)
Symptom: Logs show repeated 429 Too Many Requests. State reads are failing or returning stale data. On-chain confidence values drop sharply.
Archive node reads are high-compute-unit operations. A burst of event-triggered state reads can exhaust the hourly compute budget quickly.
Fix- Switch to the fallback provider immediately on a 429 — don't wait for a retry interval.
- Cache state reads with a 10-minute TTL; only re-fetch on parameter-change events.
- Use Multicall3 (
0xcA11bde05977b3631167028862bE2a173976CA11) to batch multipleeth_calls into a single RPC call.
Symptom: A REST feed returns a 200 response, but the updatedAt in the payload is several hours old. The normaliser marks the record as fresh because the HTTP response was successful.
Every normaliser must validate the payload's own timestamp, not just HTTP status. If the payload's updatedAt exceeds the feed's ttl_seconds, mark the record as stale regardless of HTTP status.
Normalisation errors
Symptom: Records for a specific token arrive with amount: null or wildly incorrect USD values. Normaliser log shows WARN: decimals cache miss for 0x....
- Add a circuit breaker: if resolution fails twice within 5 minutes, add the address to a
pending_resolutionlist and skip records for it rather than passing through with a null amount. - Pre-populate the decimals cache from the asset registry at startup — don't rely solely on on-chain resolution.
- For rebasing tokens (stETH, aTokens),
decimals()returns 18 but the rebase factor must be fetched separately. Use the protocol-specific normaliser, not the generic ERC-20 one.
Symptom: The same on-chain event appears twice, causing duplicate opportunity candidates with identical scores.
FixGenerate a deterministic record_id using sha256(tx_hash + log_index). Before inserting into the queue, check against a deduplication bloom filter with a 10-minute TTL. Reject records whose record_id is already in the filter.
Scoring issues
Symptom: Scoring engine is running, candidates are being assembled, but all OSF-1 scores are below 0.01. No tier-A or tier-B candidates. Market conditions appear normal.
Cause (most common)The confidence factor C is collapsing to near-zero because one or more critical feeds have exceeded their TTL. Check the output JSON's degraded_feeds array. Fix the feed, not the formula.
OSF-1 weights changed in scoring.yaml without re-calibrating the minimum score thresholds. After any weight change, run the scoring engine against a historical snapshot and compare score distributions before and after.
Symptom: During high-volatility periods, the scoring cycle exceeds its configured timeout. Output is delayed or emitted with status: "error".
Implement a candidate cap: if more than N=500 candidates are assembled in a single correlation window, apply a pre-scoring filter: (1) discard candidates with any expired source; (2) discard candidates below a minimum raw yield estimate; (3) sample uniformly from remaining up to N.
Output problems
Symptom: The JSON report shows tier-A opportunities, but no webhook alert is delivered. No error in the output router log.
FixTest filter expressions using the pipeline's built-in evaluator:
npx agent-pipeline test-filter \ --filter "tier == 'A'" \ --sample-report "./reports/latest.json" // Should output: "Filter matched 3 of 12 opportunities"
If the evaluator returns 0 matches on a report with tier-A entries, the filter syntax is wrong. Refer to the filter expression language reference in config/outputs.yaml comments.
Symptom: A downstream system rejects the output JSON with an error like required field 'estimated_return_usd' missing.
The pipeline emits a schema_version field at the top level of every output report. Downstream consumers should check this field and reject with a clear version mismatch error rather than a field-level validation error. Coordinate schema upgrades using the changelog at docs/schema-changelog.md.
First-response checklist
| # | Check | Where to look |
|---|---|---|
| 1 | Is status in the latest output JSON ok, degraded, or error? | ./reports/latest.json → top-level status |
| 2 | Are any feeds listed in degraded_feeds? | ./reports/latest.json → degraded_feeds array |
| 3 | Is the scoring cycle completing within its timeout budget? | Logs → cycle_duration_ms; alert if > 80% of configured timeout |
| 4 | Is the normalised record queue growing (backpressure)? | Metrics dashboard → queue_depth gauge |
| 5 | Is the primary RPC provider responding? | Run: curl -X POST {rpc_url} -d '{"method":"eth_blockNumber","params":[],"id":1}' |
| 6 | Did any config file change in the last deploy? | git log -5 -- config/ |
| 7 | Is the system clock synchronised? | Run: timedatectl status or ntpstat |