Developer Documentation · v1.0 · 2026

The DeFi Research Agent Pipeline

A structured framework for understanding how autonomous research agents ingest on-chain data, score opportunities, and produce verifiable outputs — without writing the agent yourself.

📡 Data Sources ⚙️ Decision Logic 📊 Output Formats 🔍 OpenClawD · AiXT Case Study Diataxis Framework

Ingestion

On-chain RPC, DEX feeds, oracle prices, event logs

›

Normalisation

Schema mapping, deduplication, timestamp alignment

›

Scoring

OSF-1 formula, opportunity ranking, risk filters

›

Decision

Strategy selection, threshold gating, confidence checks

›

Output

JSON reports, alerts, dashboards, signed attestations

Concepts

Understanding the Pipeline

Mental models, data taxonomy, trust problem, output model.

Guides

Simulating the Pipeline

Step-by-step setup, normalisation, scoring, case study.

Reference

Technical Specifications

OSF-1 formula, JSON schemas, API endpoints, strategy codes.

Troubleshooting

Diagnosing Issues

Feed failures, scoring bugs, output errors, first-response checklist.

Concepts

Understanding the DeFi Research Agent Pipeline

The mental models you need before you touch any configuration or code. Start here.

What is a DeFi research agent?

A DeFi research agent is a software system — often AI-augmented — that autonomously monitors decentralised finance protocols, ingests structured data from multiple sources, applies scoring or ranking logic to candidate opportunities, and produces outputs that inform or execute financial decisions.

In 2026, research agents range from simple yield-monitoring bots that compare APRs across lending markets, to multi-modal systems like OpenClawD and AiXT Trading Intelligence that fuse on-chain event streams, sentiment signals, and cross-chain liquidity metrics to surface opportunities within milliseconds of a state change.

The word "research" is deliberate. Unlike execution agents (which transact), a research agent's primary output is information: structured reports, scored opportunity sets, and ranked strategy candidates. A human or an execution layer makes the final call.

Key distinction A research agent observes and ranks. An execution agent acts. This documentation covers research agents only — the layer that turns raw data into investable intelligence.

Why pipeline architecture matters

DeFi data is fast, fragmented, and adversarial. A yield opportunity on Aave might last 40 seconds before arbitrageurs close it. A governance proposal on Compound can shift protocol risk parameters before your scoring model sees it. Without a structured pipeline, agents fail in predictable ways:

Stale data decisions — scoring an opportunity based on a price that was valid 12 blocks ago.
Source conflation — treating on-chain oracle prices and CEX spot prices as equivalent without normalisation.
Scoring drift — ranking logic that uses different risk parameters than the ones the output consumer expects.
Silent failures — a feed goes down; the agent keeps scoring on cached data without signalling degraded confidence.

A pipeline architecture solves this by separating concerns into discrete, observable stages: ingestion → normalisation → scoring → decision → output. Each stage has a defined contract with the next. Failures are localised. Confidence degrades explicitly rather than silently.

The five-stage model

Stage	Input	Output	Failure mode
1. Ingestion	Raw RPC calls, WebSocket events, REST feeds	Unvalidated raw records	Connection loss, rate limiting, provider outage
2. Normalisation	Raw records	Schema-validated, timestamped, deduplicated records	Schema mismatch, clock skew, duplicate events
3. Scoring	Normalised records	Scored, ranked opportunity set	Missing fields, division by zero in formula, stale input
4. Decision	Scored opportunities	Selected strategy candidates with confidence labels	All candidates below threshold, conflicting signals
5. Output	Strategy candidates	JSON report, alert, dashboard update, signed attestation	Serialisation error, delivery failure, consumer schema mismatch

Data source taxonomy

DeFi research agents draw from four categories of data. Understanding the latency, reliability, and trust characteristics of each is prerequisite knowledge for designing the ingestion layer.

On-chain data

Data read directly from a blockchain's state or event logs via an RPC endpoint or indexer. The ground-truth layer — cryptographically final after sufficient confirmations.

State reads — eth_call to read protocol parameters, liquidity pool reserves, vault balances, governance states.
Event logs — eth_getLogs for Swap, Deposit, Borrow, LiquidationCall events.
Block metadata — block number, timestamp, base fee, proposer (for MEV context).
Mempool data — pending transactions via Blocknative or Flashbots.

Latency: 1–12s (block time). Trust: cryptographic. Cost: scales with RPC call volume.

Off-chain data

Data originating outside the blockchain — faster but requires explicit trust decisions.

CEX price feeds — Binance, Coinbase, Kraken REST/WebSocket APIs.
Macro/news feeds — Bloomberg, Refinitiv, The Block, DeFiLlama news.
Social sentiment — Twitter/X, Reddit, Telegram via LunarCrush or Santiment.
Protocol risk scores — DeFi Safety, Gauntlet, Chaos Labs risk parameter APIs.

Oracle data

Decentralised price feeds that sit on-chain but aggregate off-chain price data. On-chain in location, off-chain in origin.

Chainlink Data Feeds — aggregated spot prices, updated on deviation threshold or heartbeat.
Pyth Network — pull-based oracle with sub-second updates, popular on Solana and EVM L2s.
Uniswap v3 TWAP — time-weighted average price from on-chain swap history; manipulation-resistant but slow.
Redstone — modular oracle supporting custom data types beyond spot price.

Protocol events and governance feeds

Governance proposals — Snapshot, Tally, Governor Bravo APIs for pending votes.
Parameter changes — collateral factor updates, interest rate model changes, fee tier adjustments.
Liquidation thresholds — near-liquidation position tracking via The Graph or protocol-specific APIs.
Protocol upgrades and pauses — proxy upgrade events, circuit breaker triggers.

Decision logic model

The decision logic layer has three components: a scoring engine, a ranking layer, and a strategy selector.

The scoring engine applies OSF-1 to produce a numeric score per candidate. The ranking layer sorts by score and applies filters: minimum score threshold, minimum liquidity floor, maximum protocol concentration. The strategy selector matches top-ranked opportunities against a strategy library (e.g., arb.cex-dex, yield.stablecoin-lp, liquidation.bonus).

The trust problem in autonomous agents

Without a trust framework, agents are black boxes. The pipeline addresses this with three mechanisms:

Confidence labels — every output includes a confidence field derived from data completeness and source reliability.
Data lineage — every output includes a sources array listing which feeds contributed, with freshness timestamps.
Degradation signalling — if any required feed is stale beyond a threshold, output status is set to degraded rather than silently omitting data.

ZKML note In 2026, some research agents expose ZK proofs of inference alongside outputs — cryptographic attestations that scoring was performed correctly on the claimed inputs. This is the ZKML pattern. Not required by this spec, but the direction high-stakes autonomous decision-making is heading.

Output model

The pipeline supports four output formats, producible simultaneously:

JSON report — the canonical output. All scored opportunities, strategy candidates, data lineage, and confidence metadata.
Alert — a lightweight notification (webhook, Telegram, Slack) when a scored opportunity exceeds a threshold.
Dashboard — a real-time web view backed by a WebSocket connection to the agent's scoring output stream.
Signed attestation — a cryptographically signed summary for downstream on-chain verification or audit trails.

Guides

Simulating the Research Agent Pipeline

Step-by-step instructions for configuring and running each stage of a DeFi research agent pipeline, plus a full case study.

Guide 1 — Setting up the ingestion layer

The ingestion layer pulls raw data from every configured source and places it into the normalisation queue. It runs continuously and must handle partial failures gracefully.

Configure your RPC providers

Every ingestion layer needs at least two RPC endpoints per chain — a primary and a fallback. Single-provider setups fail silently when the provider rate-limits or goes offline.

# config/providers.yaml
chains:
  ethereum:
    primary:   "wss://eth-mainnet.g.alchemy.com/v2/${ALCHEMY_KEY}"
    fallback:  "https://rpc.ankr.com/eth/${ANKR_KEY}"
    archive:   "https://eth-mainnet.g.alchemy.com/v2/${ALCHEMY_KEY}"
    block_time_ms: 12000
  arbitrum:
    primary:   "wss://arb-mainnet.g.alchemy.com/v2/${ALCHEMY_KEY}"
    fallback:  "https://rpc.ankr.com/arbitrum/${ANKR_KEY}"
    block_time_ms: 250
  base:
    primary:   "wss://base-mainnet.g.alchemy.com/v2/${ALCHEMY_KEY}"
    fallback:  "https://mainnet.base.org"
    block_time_ms: 2000

Define protocol event subscriptions

Use a declarative subscription file to specify which protocol events the ingestion layer should monitor. Each subscription maps a contract address and event signature to a data handler.

# config/subscriptions.yaml
subscriptions:
  - id: "aave_v3_borrow"
    chain: "ethereum"
    address: "0x87870Bca3F3fD6335C3F4ce8392D69350B4fA4E2"
    event:   "Borrow(address,address,address,uint256,uint8,uint256,uint16)"
    handler: "handlers.lending.on_borrow"
    priority: high

  - id: "uniswap_v3_swap"
    chain: "arbitrum"
    address: "*"  # wildcard: all UniV3 pools
    event:   "Swap(address,address,int256,int256,uint160,uint128,int24)"
    handler: "handlers.amm.on_swap"
    priority: medium

  - id: "compound_v3_liquidation"
    chain: "ethereum"
    address: "0xc3d688B66703497DAA19211EEdff47f25384cdc3"
    event:   "AbsorbCollateral(address,address,address,uint256,uint256)"
    handler: "handlers.lending.on_liquidation"
    priority: critical

Configure off-chain feeds

Off-chain feeds are polled on a schedule rather than subscribed to. Set a ttl_seconds for each — the normalisation layer uses this to mark data as stale and downgrade confidence scores.

# config/feeds.yaml
feeds:
  - id: "binance_spot"
    type: "websocket"
    url: "wss://stream.binance.com:9443/ws/!ticker@arr"
    ttl_seconds: 30
    normaliser: "normalisers.cex.binance"

  - id: "defillama_tvl"
    type: "rest_poll"
    url: "https://api.llama.fi/v2/protocols"
    poll_interval_seconds: 120
    ttl_seconds: 300
    normaliser: "normalisers.analytics.defillama"

  - id: "chainlink_eth_usd"
    type: "onchain_oracle"
    chain: "ethereum"
    address: "0x5f4eC3Df9cbd43714FE2740f5E3616155c5b8419"
    heartbeat_seconds: 3600
    ttl_seconds: 4200
    normaliser: "normalisers.oracle.chainlink"

Guide 2 — Normalising on-chain and off-chain data

Normalisation converts heterogeneous raw records into a common schema. The three most critical operations are timestamp alignment, token amount precision, and asset identifier normalisation.

Timestamp alignment: different sources use different formats. The normaliser converts everything to Unix milliseconds UTC. Use the source event_at for causal ordering; use the agent-stamped ingested_at for latency calculations.

Token amount precision: EVM token amounts are raw integers (e.g., 1000000 for 1 USDC at 6 decimals). Resolve decimals from a cached on-chain call and convert to decimal strings. Store the raw value alongside to prevent rounding errors.

Asset identifier normalisation: the same asset may appear as WETH, ETH, 0xC02aaA...4082, or ethereum. Map all representations to canonical IDs using a maintained asset registry.

// Example: normalised record output
{
  "record_id":     "evt_arb_uniswap_0x7a3b...91c2_14829401",
  "source_id":     "uniswap_v3_swap",
  "source_type":   "onchain_event",
  "chain":         "arbitrum",
  "event_at":      1719432811000,
  "ingested_at":   1719432811340,
  "asset_in":  { "id": "WETH", "amount": "2.5", "raw": "2500000000000000000" },
  "asset_out": { "id": "USDC", "amount": "8312.44", "raw": "8312440000" },
  "price_usd":     "3324.976",
  "confidence":    0.97,
  "data_freshness":"live"
}

Guide 3 — Running the scoring engine

The scoring engine consumes normalised records, groups them into opportunity candidates via a correlation engine, and applies OSF-1. An opportunity candidate is a group of records that together describe one actionable situation — e.g., a CEX price record + a DEX pool reserve read + a Chainlink oracle price forming a CEX-DEX arbitrage signal.

The correlation window is configurable (default: 5 seconds). Tighter windows reduce false positives but may miss slow-moving opportunities. In pseudocode:

// OSF-1 pseudocode
score = (yield_estimate * urgency_weight * confidence_factor)
        / (risk_score + liquidity_penalty)

yield_estimate    = expected_return_bps / 10000
urgency_weight    = 1 + (1 / max(1, opportunity_age_seconds))
confidence_factor = avg(source_confidences) * data_freshness_factor
risk_score        = protocol_risk + oracle_risk + smart_contract_risk
liquidity_penalty = 1 + max(0, (position_size / pool_liquidity) - 0.01) * 5

Guide 4 — Interpreting the decision layer output

Each strategy candidate contains: the matched opportunity, the recommended strategy code, the expected entry parameters, and a confidence tier (A/B/C/D).

Tier	Score range	Meaning	Suggested action
A	0.80 – 1.00	High confidence, strong signal, low risk	Execute or escalate for human review
B	0.60 – 0.79	Good signal, moderate risk or data uncertainty	Flag for human analyst review
C	0.40 – 0.59	Weak signal or elevated risk	Monitor only; do not act
D	0.00 – 0.39	Below threshold; included for transparency	Discard

Guide 5 — Reading and routing outputs

The pipeline produces a primary JSON report each scoring cycle (default: every 60 seconds, or on-demand for critical events). Configure the output router in config/outputs.yaml:

# config/outputs.yaml
outputs:
  - id: "json_file"
    type: "file"
    path: "./reports/latest.json"
    overwrite: true

  - id: "webhook_alert"
    type: "webhook"
    url: "${ALERT_WEBHOOK_URL}"
    filter: "tier == 'A'"
    payload_template: "templates/alert_slim.json"

  - id: "dashboard_ws"
    type: "websocket_server"
    port: 8765
    emit_on: "every_cycle"

Case Study: OpenClawD & AiXT Trading Intelligence

A real-world-style walkthrough of how modern AI-augmented research agents feed their pipelines

OpenClawD and AiXT Trading Intelligence represent two distinct but complementary approaches to AI-augmented DeFi research. This case study traces the data path from raw chain state to scored output in both systems.

OpenClawD — event-driven ingestion

OpenClawD's architecture is designed around reactive ingestion: subscribes to a curated set of high-signal on-chain events and constructs opportunity candidates reactively when correlated events arrive.

┌─ OpenClawD Ingestion Layer ──────────────────────────────────────────┐ │ │ │ WebSocket Subscriptions ──► Event Router ──► Correlation Engine │ │ (Alchemy / Infura) │ │ │ │ │ routes by │ groups by │ │ │ event type │ asset+time │ │ Off-chain Feeds ────────────────┘ ▼ │ │ (Binance WS, Pyth, DeFiLlama) Candidate Assembler │ │ │ │ │ OSF-1 Scoring Engine │ └───────────────────────────────────────────────────────────────────────┘

Source	Type	Data extracted	Refresh
Alchemy WebSocket	On-chain events	Swap, Borrow, LiquidationCall, Transfer for monitored protocols	Per-block (12s / 250ms)
Pyth Network	Oracle	Real-time price updates for 80+ assets	400ms
Binance WebSocket	Off-chain CEX	Best bid/ask, 24h volume, funding rates for perp markets	100ms
Gauntlet Risk API	Protocol risk	Collateral risk scores, utilisation rates, risk parameter changes	5 min
Snapshot API	Governance	Active proposals, vote counts, time-to-close	15 min

When a Swap event arrives from a Uniswap v3 pool, OpenClawD's normaliser: decodes the ABI-encoded log → resolves token0/token1 to canonical IDs → converts raw integers to decimal strings → computes implied price from sqrtPriceX96 → cross-references against the most recent Pyth price → stamps timestamps → pushes to the correlation queue. If the price deviation exceeds 0.5%, the record is tagged price_divergence: true, fast-tracking CEX-DEX arbitrage candidate assembly.

AiXT Trading Intelligence — poll-and-model ingestion

AiXT runs a continuous polling loop against a wider set of sources, feeds all data into a feature store, and applies a gradient-boosted tree model to score the feature vector every 30 seconds. Better suited for longer-horizon signals — macro yield trends, governance risk, protocol health — where event-driven ingestion would produce too much noise.

AiXT's feature vector per (protocol, asset_pair, chain) triplet includes: current APR and 7-day rolling average, utilisation rate and its 1h rate of change, TVL and its 24h rate of change, governance risk score, CEX funding rate, and a 4h rolling sentiment score from LunarCrush. The model is retrained weekly on new outcome data and outputs an OSF-1 compatible score, feeding the same decision layer as OpenClawD.

Key takeaway Event-driven (OpenClawD) and poll-and-model (AiXT) are complementary, not competing. A production stack runs both: event-driven handles sub-minute opportunities; model-based handles multi-day signals. Both feed the same decision layer and produce the same output format.

Reference

Technical Specifications

Complete API endpoints, JSON schemas, scoring formulas, and configuration options. Use this section as a lookup, not a reading guide.

Opportunity Scoring Formula (OSF-1)

OSF-1 produces a dimensionless score in the range [0, ∞). In practice well-formed opportunities score between 0.0 and 2.0; scores above 1.0 are exceptional.

/**
 * OSF-1 — Opportunity Scoring Formula, version 1
 *
 * score = (Y × U × C) / (R + L)
 *
 * Y  — Yield estimate      (dimensionless, basis points / 10000)
 * U  — Urgency weight      (dimensionless, ≥ 1.0)
 * C  — Confidence factor   (0.0 – 1.0)
 * R  — Risk score          (dimensionless, > 0)
 * L  — Liquidity penalty   (dimensionless, ≥ 1.0)
 */

Y = expected_return_bps / 10000
// For yield: APR difference between protocols
// For arb:   price spread minus estimated gas cost in bps
// For liq:   bonus percentage in bps

U = 1 + (1 / max(1, opportunity_age_seconds))
// 0s old: U=2.0 | 60s old: U≈1.016 | 300s old: U≈1.003

C = mean(source_confidences) × F
// F — data freshness factor:
//   1.0  if all sources within TTL
//   0.7  if any source is 1–2× TTL
//   0.4  if any source is 2–3× TTL
//   0.0  if any source exceeds 3× TTL

R = w_sc×SC + w_or×OR + w_lq×LQ
// SC: smart contract risk  OR: oracle risk  LQ: liquidity risk
// Default weights: w_sc=0.5, w_or=0.3, w_lq=0.2

L = 1 + max(0, (position_size_usd / pool_liquidity_usd - 0.01)) × 5
// Neutral (1.0) at ≤1% of pool liquidity. Rises linearly above.

Scenario	Y	U	C	R	L	Score	Tier
CEX-DEX arb, 25bps, 0s old, live feeds	0.0025	2.0	0.95	0.25	1.01	0.019	—
Yield arb, 180bps, 30s old, live feeds	0.018	1.032	0.92	0.40	1.0	0.043	—
Liquidation, 5% bonus, 0s old, low-risk protocol	0.05	2.0	0.97	0.15	1.0	0.647	B
Liquidation, 8% bonus, 0s old, blue-chip	0.08	2.0	0.98	0.10	1.0	1.568	A

Note on score magnitude OSF-1 scores are only meaningful relative to each other within a single scoring cycle. Always use the tier label (A/B/C/D) for cross-cycle comparisons.

NormalisedRecord schema

Every record produced by the normalisation stage must conform to this schema. Absent required fields cause the record to be dropped.

{
  "record_id":      "string",   // required — unique, deterministic ID
  "source_id":      "string",   // required — subscription or feed id
  "source_type":    "string",   // required — onchain_event | onchain_state | oracle | cex_feed | rest_api
  "chain":          "string",   // required for onchain_* types
  "protocol":       "string",   // optional — e.g. "aave_v3"
  "event_at":       number,     // required — Unix ms UTC (source timestamp)
  "ingested_at":    number,     // required — Unix ms UTC (agent receipt time)
  "block_number":   number,     // optional — for onchain_* types
  "tx_hash":        "string",   // optional — 0x-prefixed hex
  "assets": [{                  // required — at least one
    "id":        "string",      // canonical asset id
    "role":      "string",      // in | out | collateral | debt | supply
    "amount":    "string",      // decimal string
    "amount_raw":"string",      // original integer string
    "amount_usd":"string"       // USD value at event_at, if resolvable
  }],
  "price_usd":      "string",   // optional — implied price if single-pair event
  "metadata":       "object",   // optional — source-specific extra fields
  "confidence":     number,     // required — 0.0–1.0
  "data_freshness": "string",   // required — live | fresh | stale | expired
  "tags":           "string[]"  // optional — e.g. ["price_divergence"]
}

Output JSON schema

The primary output of each scoring cycle, produced as latest.json and optionally emitted over WebSocket.

{
  "report_id":         "string",    // UUID v4
  "generated_at":      number,      // Unix ms UTC
  "cycle_duration_ms": number,
  "status":            "string",    // ok | degraded | error
  "degraded_feeds":    "string[]",
  "opportunities": [{
    "opp_id":               "string",
    "type":                 "string",  // arb.cex_dex | yield | liquidation | governance_risk
    "score":                number,
    "tier":                 "string",  // A | B | C | D
    "strategy":             "string",
    "assets":               "string[]",
    "protocols":            "string[]",
    "chains":               "string[]",
    "estimated_return_bps": number,
    "estimated_return_usd": "string",
    "risk_score":           number,
    "confidence":          number,
    "expires_at":           number,
    "sources": [{ "record_id":"string", "source_id":"string", "ingested_at":number, "confidence":number }]
  }],
  "summary": {
    "total_candidates": number, "tier_a_count": number, "tier_b_count": number,
    "highest_score": number, "feeds_healthy": number, "feeds_degraded": number
  }
}

Data source API endpoints

Source	Endpoint	Auth	Rate limit	Notes
Alchemy (ETH)	`wss://eth-mainnet.g.alchemy.com/v2/{key}`	API key	Compute units / month	Prefer WebSocket for event subs; HTTPS for archive reads
Pyth Network	`https://hermes.pyth.network/v2/updates/price/latest`	None	None (best-effort)	Poll `?ids[]=` with price feed IDs. Max 100 ids per request.
Binance WebSocket	`wss://stream.binance.com:9443/ws/{stream}`	None (public)	300 connections / 5 min per IP	Use combined streams: `/stream?streams=ethusdt@bookTicker/btcusdt@bookTicker`
DeFiLlama	`https://api.llama.fi/v2/protocols`	None	~300 req/min	For pool-level yield: `https://yields.llama.fi/pools`
The Graph	`https://api.thegraph.com/subgraphs/name/{subgraph}`	API key	1000 req/day (free)	GraphQL POST. Use for historical queries; too slow for real-time.
Chainlink Data Feeds	`eth_call` → `latestRoundData()`	None (on-chain)	RPC rate limit	Returns `(roundId, answer, startedAt, updatedAt, answeredInRound)`. Check `updatedAt` vs heartbeat.
Gauntlet Risk API	`https://risk.gauntlet.network/api/v1/protocols/{id}`	API key	Per contract	Returns risk scores, utilisation, recommended parameters. B2B access required.
Snapshot	`https://hub.snapshot.org/graphql`	None	Unlisted, generous	GraphQL. Query `proposals` with `state: "active"` for live governance risk.

Asset registry (common entries)

Canonical ID	Symbol	Ethereum address	Decimals
`WETH`	WETH	`0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2`	18
`USDC`	USDC	`0xA0b86991c6218b36c1d19D4a2e9Eb0cE3606eB48`	6
`USDT`	USDT	`0xdAC17F958D2ee523a2206206994597C13D831ec7`	6
`WBTC`	WBTC	`0x2260FAC5E5542a773Aa44fBCfeDf7C193bc2C599`	8
`stETH`	stETH	`0xae7ab96520DE3A18E5e111B5EaAb095312D7fE84`	18
`DAI`	DAI	`0x6B175474E89094C44Da98b954EedeAC495271d0F`	18

Strategy codes

Code	Name	Opportunity type	Typical Y range (bps)
`arb.cex_dex`	CEX-DEX arbitrage	Price divergence between CEX spot and DEX pool	5 – 80
`arb.dex_dex`	DEX-DEX arbitrage	Price divergence between two DEX pools	2 – 40
`arb.cross_chain`	Cross-chain arbitrage	Price divergence across chains via bridge	10 – 200
`yield.stable_lp`	Stablecoin LP yield	Superior LP APR vs lending rate for same stablecoin	20 – 500
`yield.lending`	Lending rate arb	Supply/borrow rate spread across protocols	10 – 300
`liquidation.bonus`	Liquidation	Under-collateralised position with bonus > gas cost	300 – 1500
`governance.risk_flag`	Governance risk signal	Active proposal changing protocol risk parameters	N/A (informational)

Configuration reference

All configuration files live in config/. Hot reload is supported for feeds.yaml, scoring.yaml, and outputs.yaml; restart required for providers.yaml and subscriptions.yaml.

File	Hot reload	Key fields
`providers.yaml`	No	`chains[].primary`, `.fallback`, `.block_time_ms`
`subscriptions.yaml`	No	`subscriptions[].id`, `.chain`, `.address`, `.event`, `.handler`, `.priority`
`feeds.yaml`	Yes	`feeds[].id`, `.type`, `.url`, `.poll_interval_seconds`, `.ttl_seconds`, `.normaliser`
`scoring.yaml`	Yes	`osf1.weights.*`, `thresholds.min_score`, `thresholds.min_liquidity_usd`, `correlation_window_seconds`
`outputs.yaml`	Yes	`outputs[].id`, `.type`, `.filter`, `.payload_template`

Troubleshooting

Diagnosing Pipeline Issues

Common failure modes, their symptoms, and how to fix them. Organised by pipeline stage.

Before diving in Most pipeline issues produce a status: "degraded" or status: "error" field in the output JSON, with a degraded_feeds array listing affected sources. Start there — it identifies which stage to investigate first.

Ingestion failures

Ingestion WebSocket connection drops silently

Symptom: The agent is running, the feed shows as live, but no new events have arrived for several minutes. Opportunity scores based on this feed begin degrading.

Cause

WebSocket providers (including Alchemy and Infura) silently drop connections after periods of low activity or when the server rotates. The client does not receive a close frame, so it does not attempt to reconnect.

Fix

Implement a heartbeat check: if no message has been received within 2 × block_time_ms, close and reconnect. For Ethereum mainnet: ~24 seconds. For Arbitrum: 500ms.

// Heartbeat pattern (pseudocode)
lastMessageAt = Date.now()
ws.on('message', () => { lastMessageAt = Date.now() })
setInterval(() => {
  if (Date.now() - lastMessageAt > 2 * BLOCK_TIME_MS) {
    log.warn('Heartbeat timeout — reconnecting')
    ws.close(); reconnect()
  }
}, BLOCK_TIME_MS)

Ingestion RPC provider rate limit (429 errors)

Symptom: Logs show repeated 429 Too Many Requests. State reads are failing or returning stale data. On-chain confidence values drop sharply.

Cause

Archive node reads are high-compute-unit operations. A burst of event-triggered state reads can exhaust the hourly compute budget quickly.

Fix

Switch to the fallback provider immediately on a 429 — don't wait for a retry interval.
Cache state reads with a 10-minute TTL; only re-fetch on parameter-change events.
Use Multicall3 (0xcA11bde05977b3631167028862bE2a173976CA11) to batch multiple eth_calls into a single RPC call.

Ingestion Off-chain feed returns stale data without error

Symptom: A REST feed returns a 200 response, but the updatedAt in the payload is several hours old. The normaliser marks the record as fresh because the HTTP response was successful.

Fix

Every normaliser must validate the payload's own timestamp, not just HTTP status. If the payload's updatedAt exceeds the feed's ttl_seconds, mark the record as stale regardless of HTTP status.

Normalisation errors

Normalisation Token decimals resolution failure

Symptom: Records for a specific token arrive with amount: null or wildly incorrect USD values. Normaliser log shows WARN: decimals cache miss for 0x....

Fix

Add a circuit breaker: if resolution fails twice within 5 minutes, add the address to a pending_resolution list and skip records for it rather than passing through with a null amount.
Pre-populate the decimals cache from the asset registry at startup — don't rely solely on on-chain resolution.
For rebasing tokens (stETH, aTokens), decimals() returns 18 but the rebase factor must be fetched separately. Use the protocol-specific normaliser, not the generic ERC-20 one.

Normalisation Duplicate records in the queue

Symptom: The same on-chain event appears twice, causing duplicate opportunity candidates with identical scores.

Fix

Generate a deterministic record_id using sha256(tx_hash + log_index). Before inserting into the queue, check against a deduplication bloom filter with a 10-minute TTL. Reject records whose record_id is already in the filter.

Scoring issues

Scoring All opportunities scoring near zero

Symptom: Scoring engine is running, candidates are being assembled, but all OSF-1 scores are below 0.01. No tier-A or tier-B candidates. Market conditions appear normal.

Cause (most common)

The confidence factor C is collapsing to near-zero because one or more critical feeds have exceeded their TTL. Check the output JSON's degraded_feeds array. Fix the feed, not the formula.

Cause (secondary)

OSF-1 weights changed in scoring.yaml without re-calibrating the minimum score thresholds. After any weight change, run the scoring engine against a historical snapshot and compare score distributions before and after.

Scoring Scoring engine timeout on large candidate sets

Symptom: During high-volatility periods, the scoring cycle exceeds its configured timeout. Output is delayed or emitted with status: "error".

Fix

Implement a candidate cap: if more than N=500 candidates are assembled in a single correlation window, apply a pre-scoring filter: (1) discard candidates with any expired source; (2) discard candidates below a minimum raw yield estimate; (3) sample uniformly from remaining up to N.

Output problems

Output Webhook alerts not firing for tier-A opportunities

Symptom: The JSON report shows tier-A opportunities, but no webhook alert is delivered. No error in the output router log.

Fix

Test filter expressions using the pipeline's built-in evaluator:

npx agent-pipeline test-filter \
  --filter "tier == 'A'" \
  --sample-report "./reports/latest.json"
// Should output: "Filter matched 3 of 12 opportunities"

If the evaluator returns 0 matches on a report with tier-A entries, the filter syntax is wrong. Refer to the filter expression language reference in config/outputs.yaml comments.

Output Consumer reports schema validation failure

Symptom: A downstream system rejects the output JSON with an error like required field 'estimated_return_usd' missing.

Fix

The pipeline emits a schema_version field at the top level of every output report. Downstream consumers should check this field and reject with a clear version mismatch error rather than a field-level validation error. Coordinate schema upgrades using the changelog at docs/schema-changelog.md.

First-response checklist

#	Check	Where to look
1	Is `status` in the latest output JSON `ok`, `degraded`, or `error`?	`./reports/latest.json` → top-level `status`
2	Are any feeds listed in `degraded_feeds`?	`./reports/latest.json` → `degraded_feeds` array
3	Is the scoring cycle completing within its timeout budget?	Logs → `cycle_duration_ms`; alert if > 80% of configured timeout
4	Is the normalised record queue growing (backpressure)?	Metrics dashboard → `queue_depth` gauge
5	Is the primary RPC provider responding?	Run: `curl -X POST {rpc_url} -d '{"method":"eth_blockNumber","params":[],"id":1}'`
6	Did any config file change in the last deploy?	`git log -5 -- config/`
7	Is the system clock synchronised?	Run: `timedatectl status` or `ntpstat`