MERCURIUS
Strategy Changelog
DASHBOARD GOVERNANCE INTELLIGENCE BACKTEST RESEARCH CHANGELOG
STRATEGY CHANGELOG

A record of every strategic decision, what the data showed, what we changed, and why. The self-improvement loop made visible.

Live Performance vs Targets
Loading benchmarks...
Mercurius v7.1: Self-Improving ML Pipeline & Shadow P&L

Closed the loop from data collection to model improvement. The ML filter now retrains weekly on all historical data, tracks what would have happened on blocked trades, and reports results via Telegram. Pair confluence features from combo analysis add 5 new predictive signals.

ML Features
15 → 20
+5 pair confluence
AUC
0.707 → 0.761
+0.054 from pairs
Blocked WR
2.6%
ML correctly rejects
Boosted WR
72.0%
ML correctly boosts

1. ML Threshold Calibration

  • Block threshold calibrated to base rate — Original thresholds (block <40%, boost >70%) would have blocked every trade since the base win rate is 30% (system profits via 3.25:1 payoff ratio, not win rate). Recalibrated: block <15% (half base rate), boost >40%.
  • Validation — Blocked trades had 4.3% actual WR, boosted trades had 72.2% WR. Strong discrimination.

2. Self-Improving ML Pipeline

  • Weekly retrain with model comparison — Sunday 23:00 UTC retrain compares accuracy and AUC delta against previous model. Feature importance tracking identifies which signals matter most.
  • Live prediction audit — Joins ml_predictions with closed positions to verify ML discrimination is maintained in production (blocked vs boosted actual win rates).
  • Telegram retrain report — Comprehensive report sent after each retrain: training stats, model comparison, threshold calibration, top 5 features, live audit, shadow P&L summary.

3. Shadow P&L for Blocked Trades

  • Counterfactual tracking — Every blocked trade (ML filter, event guard, anti-stack, regime gate, etc.) logs entry price to shadow_pnl table. Price checked 4h later to determine if the trade would have won.
  • Guard effectiveness validation — Shadow P&L by block reason shows whether each guard is filtering good or bad trades. Reported weekly in ML retrain Telegram report.

4. Pair Confluence Features

  • Combo analysis on 652 historical trades — Atlas+Pulse pair: 62.2% WR (+£1,017, 45 trades). Oracle+Pulse: 8.3% WR. Cipher+Oracle: 7.7% WR. Strong predictive signal in which agents agree.
  • 5 binary pair features in ML modelhas_atlas_pulse, has_cipher_oracle, has_oracle_pulse, has_atlas_cipher, has_atlas_sentinel. Top feature after retrain: has_atlas_sentinel at 33.2% importance.

5. Weekly Anomaly Flags

  • 7 automated flags in Telegram digest — Trade drought (5+ days), win rate drop (>15% vs prior week), agent accuracy below 30%, concentration risk (single instrument >60% of P&L), high ML block rate (>50%), shadow P&L showing over-aggressive guards, large weekly drawdown (>£50).
Mercurius v7.0: Edge, ML & Market Structure Upgrade

Comprehensive upgrade across six areas: mathematical edge, position sizing, agent intelligence, execution quality, and ML-based trade filtering. Built on the +£1,102 all-time paper performance (671 trades, 70% win rate last week).

Voting Agents
5 → 6
+Structure agent
Guard Layers
7 → 9
+Market Hours, R:R, ML
Edge Formula
Kelly
Proper (p×b - q)/b
Data Sources
+2
Congress trades, Wyckoff

1. Market Hours & Execution Fixes

  • Market hours guard (Layer 0) — Skips instruments where the market is closed. Parses all IG config formats (24/7, 24/5, 08-21, 09:30-18:30). Prevents wasteful API calls and IG MARKET_CLOSED rejections.
  • Extended rejection memory — MARKET_CLOSED and MARKET_ROLLED rejections now tracked with longer cooldown. Instruments aren't retried until is_market_open() returns True.

2. Kelly Criterion Edge & Position Sizing

  • Edge formula replaced — Old: expected_value / avg_loss × conviction_ratio. New: (p × b - q) / b × conviction_ratio × regime_mult. Proper Kelly criterion with regime multipliers (trending=1.0, volatile=0.7, ranging=0.8).
  • Position sizing from edgekelly_raw = edge instead of avg_confidence × 0.5. Size now scales with actual mathematical edge, not just confidence.
  • R:R enforcement (Layer 8) — Rejects trades where limit_distance / stop_distance < 1.5. Every trade must have asymmetric payoff.
  • Pre-trade margin check — Skips if estimated margin exceeds 90% of available balance.

3. Non-Linear Agent Weights

  • Weight formula — Old: accuracy × 2 (range 0.3–2.5). New: accuracy^1.5 × 3 (range 0.2–3.0). Cipher (83.7% acc) now gets 2.30 weight vs Sentinel (36.6%) at 0.66 — a 3.5x gap.
  • EMA decay alpha — 0.10 → 0.15 for faster adaptation to recent performance.

4. Structure Agent (6th Voting Agent)

  • Wyckoff phase detection — Identifies accumulation, distribution, markup, and markdown phases from 20-bar price/volume windows.
  • Volume Profile (POC/Value Area) — Point of Control and Value Area computed from candle data. Signal: price above/below POC.
  • Market structure (swing analysis) — Detects HH+HL (uptrend) and LH+LL (downtrend) patterns from swing highs and lows.
  • Support/Resistance levels — Key levels from price clustering. Signal from proximity and reaction.
  • Conviction sizing updated — {3: 1.0, 4: 1.5, 5: 2.5, 6: 3.5} to accommodate the 6th agent.

5. Congressional Trading Data

  • House Stock Watcher collector — Fetches congressional stock transactions from public S3 data. Maps tickers to instruments (NVDA→NAS100, XOM→OIL_BRENT, GLD→XAU/USD). Daily at 10:00 UTC.
  • Oracle integration — Congressional purchase/sale signals feed into Oracle's fundamental convergence logic for SP500, NAS100, FTSE100, OIL_BRENT, XAU/USD, coffee.

6. XGBoost ML Trade Filter (Layer 9)

  • 15-feature binary classifier — Features: conviction, avg_confidence, num_dissenting, RSI, BB %B, ATR ratio, composite signal, Fear & Greed, ADX, BB width pctl, regime one-hot (3), hour of day, day of week.
  • Win probability gate — Block if win_prob < 15% (well below 30% base rate). Boost conviction +1 if win_prob > 40%. Thresholds calibrated to base rate since system profits via R:R, not win rate. Graceful degradation: returns neutral 0.5 until 50+ training samples.
  • Weekly retrain — Walk-forward split (80/20 chronological). Sunday 23:00 UTC. Model saved to data/xgb_trade_filter.pkl.
  • Audit trail — Every ML prediction logged to ml_predictions table with features, win probability, and decision.
Mercurius v2.1: £100K Virtual Bankroll + CFD-Only Operation

Account consolidation. Disabled the spreadbet account entirely — Mercurius now operates CFD-only. All UI simplified: removed the account selector dropdown, hardcoded to the CFD account. One account, one mode, less surface area for confusion.

Virtual bankroll. Set the virtual bankroll to £100,000 for all position sizing calculations. The IG demo account maintains ~£10K actual balance, but all sizing, risk parameters, and P&L tracking use the £100K virtual figure. This decouples position sizing from the demo balance constraint and lets the system trade at the scale it was designed for.

Risk parameters scaled to match:

Before
bankroll = £8,000
max_per_trade = £50
max_daily_loss = £500
asset max_gbp = £150
After
bankroll = £100,000
max_per_trade = £500
max_daily_loss = £5,000
asset max_gbp = £1,500

Strategy presets scaled. Per-trade sizing across all presets updated to reflect the new bankroll:

Aggressive
£500
per trade
Balanced
£1,000
per trade
Conservative
£1,500
per trade
Sniper
£3,000
per trade

P&L tracking decoupled from IG balance. The dashboard now uses DB-based P&L (sum of realized + unrealized from the positions table) instead of comparing the IG balance against starting capital. The header shows virtual portfolio value (bankroll + DB P&L), making performance reporting independent of IG account fluctuations.

Trade history archived. All pre-v2 data (638 positions, 1,868 trades) moved to archive tables (positions_archive, trades_archive). V2_CUTOFF set to May 20, 2025 — all stats, benchmarks, and performance tracking start fresh from this date.

Dashboard visual uplift. Refined CSS across all pages — improved card layouts, table styling, navigation consistency, typography spacing, and custom scrollbars. The system looks as serious as its ambitions.

Dynamic Governance + Trading Config API
  • New /api/trading-config endpoint — Exposes all trading parameters, conviction sizing multipliers, hold periods, and governance configuration dynamically. No more hardcoded values scattered across the frontend.
  • Governance page fetches params from API — The governance page now pulls all trading rules, thresholds, and guard configurations from the live API instead of hardcoding them in HTML. Changes to config propagate instantly.
  • Strategy presets visible and switchable — All four strategy presets (aggressive, balanced, conservative, sniper) are now displayed on the governance page with their full parameter sets, and can be switched from the UI.
Regime Gate Exemption & Volatile Guard Relaxation

Post-deployment analysis of the first hotfix revealed FTSE100 SELL consensus was still being blocked (59 blocked decisions in 7 days) by the regime gate. Despite being restricted to SELL-only (which historically worked in all regimes), the asset-class-level gate for ("indices", "ranging") blocked every signal. Additionally, XAG/USD in volatile regime required conviction ≥4, which is effectively 80% of all agents agreeing — too high for the current market environment.

  • FTSE100 exempt from regime gate — Added REGIME_GATE_EXEMPTIONS set. FTSE100 passes through regardless of regime since it already has the SELL-only directional restriction as a guardrail.
  • Volatile regime guard relaxed — Was: conviction ≥4 required. Now: conviction 3 allowed through if avg_confidence ≥ 0.55. This means strong consensus at lower conviction can still trade in volatile markets.
  • Governance docs updated to v5.0 — Full rewrite: 5-agent table, 7-layer guard system documented, regime gate rules, conviction sizing updated, instrument restrictions table, historical edge formula, trading parameters corrected.
Unblock Trading: Confidence Threshold & Sizing Corrections

Four-day post-overhaul review revealed the system was completely paralyzed — zero new trades executed since May 12. The guards worked too well: every instrument was blocked by at least one gate. The system went from over-trading (31/day) to not trading at all (0/day).

Trades Since Overhaul
0
Target: 5-8/day
Consensus Formed
2,585
All blocked by gates
Guards Blocked
373
Regime + restriction + stacking
SP500 Rejected
INSUF FUNDS
IG margin too low at £150/trade

Root cause analysis:

  • MIN_VOTE_CONFIDENCE too high (0.35) — Sentinel votes at 0.30 and Pulse at 0.10-0.29. Both were filtered on every single vote, leaving only Atlas (0.40) + Oracle (0.71) + sometimes Cipher as eligible voters. With only 2-3 qualifying agents, consensus rarely formed.
  • FTSE100 min_conviction=4 unreachable — With 5 agents, getting 4 to agree on SELL is extremely rare. Every FTSE consensus was SELL with conviction=3, then blocked by the restriction. Hundreds of legitimate SELL signals were discarded.
  • SP500 INSUFFICIENT_FUNDS — The only instrument that passed all guards (trending regime, BUY consensus). But £150/trade at 20:1 leverage exceeded remaining margin on the demo account (balance ~£8K after £2K in losses). IG rejected every order.
  • Commodities not blocked, but no consensus — OIL_BRENT, coffee, XAU, XAG all had regime gate set to ALLOWED for commodities. The issue was upstream: too few agents passed the confidence filter to form consensus.
Before (Paralyzed)
MIN_VOTE_CONFIDENCE = 0.35
FTSE100 min_conviction = 4
max_per_trade = £150
bankroll_gbp = £10,000
Result: 0 trades in 4 days
After (Fixed)
MIN_VOTE_CONFIDENCE = 0.25
FTSE100 min_conviction = 3
max_per_trade = £50
bankroll_gbp = £8,000
All 5 agents can participate

Lesson: When multiple independent guards each have a 60-80% pass rate, the combined pass rate is multiplicative. Seven guards at 80% each = 0.87 = 21% pass rate. The system needs each individual gate to be permissive enough that the combination still allows quality trades through.

Mercurius v2: Quality Over Quantity

After 21 days of live demo trading (651 closed positions), a comprehensive analysis revealed that the system was massively over-trading with inverted conviction signals. This overhaul restructures the entire trading pipeline.

Win Rate
29.6%
Target: 50%+
Trades/Day
31
Target: 5-8
Total P&L
+$530
21 days
BUY Bias
88%
Target: <65%
Agent Council Pruned: 8 → 5 Voting Agents

Three agents consistently produced noise rather than signal. Removing them raises the consensus bar from 37.5% (3/8) to 60% (3/5), meaning every trade now requires genuine majority agreement.

  • Astral — 20-30% accuracy. Moon phases and seasonal patterns had no predictive value.
  • Contrarian — 4-29% accuracy. Overlapped with Pulse's contrarian logic, added confusion.
  • Correlation — 11-27% accuracy. Intermarket divergence signals were consistently wrong.
Before
8 voting agents
3/8 = 37.5% consensus
Conviction 3-7 sizing
5-agent consensus = 0% WR
After
5 voting agents
3/5 = 60% consensus
Conviction 3-5 sizing
Higher bar = higher quality
Instrument Set Refocused: Forex Eliminated, Commodities Core

Forex pairs consumed 47% of all trades but produced no meaningful P&L. The system's genuine edge is in commodities where CFTC positioning, weather data, and fundamental analysis provide information advantages.

  • GBP/USD — 114 trades, exactly $0 P&L. Every single trade was breakeven.
  • AUD/USD — 191 trades, $2 P&L. 9 stacked positions at time of removal.
  • EUR/USD, USD/JPY, USD/CHF — No edge, low conviction, removed.
  • BTC/USD, ETH/USD — 1:1 leverage, no data advantage over crypto-native traders.
  • DAX40, Sugar, NATGAS — Low conviction or too volatile.
Before (9 instruments)
5 forex pairs
3 indices
1 commodity
After (7 instruments)
4 commodities: Oil, Coffee, Gold, Silver
3 indices: FTSE, S&P, NASDAQ
0 forex, 0 crypto
Seven-Layer Guard System in Arbiter

The Arbiter now runs every consensus decision through seven sequential guards before creating a trade opportunity. Each blocked decision is stored in the database with full reasoning for audit and analysis.

  • Event Guard — Block trades near high-impact economic events (existing)
  • Anti-Stacking — No duplicate positions in same direction. Root cause of 9 stacked AUD/USD positions.
  • Daily Trade Cap — Max 8 trades/day (was averaging 31/day)
  • Instrument Cooldown — 4-6 hour cooldown per instrument after any position
  • Regime Gate — Block ranging markets for indices/forex/crypto. Ranging regime had 8.3% WR vs 57.5% for trending.
  • Volatile Regime — Require conviction ≥ 4 in volatile markets
  • Instrument Restrictions — FTSE100 SELL-only (BUY lost heavily), S&P/NASDAQ trending-only
Historical Performance-Based Edge Calculation

The old edge formula (conviction / N) * avg_confidence was synthetic — it always produced "tradeable" edges regardless of whether the system actually made money on that instrument. A 3/5 consensus at 0.50 confidence gave 0.30 edge, well above the 0.05 threshold.

Old Formula
edge = (conviction / N) * confidence
Always produces positive edge
5% threshold (too low)
No connection to actual P&L
New Formula
30-day historical win rate + avg P&L
Losing instruments get negative edge
8% threshold (raised)
Cold start: 50% haircut + BUY penalty

This means instruments that actually lose money will self-correct — their edge drops below threshold and trading halts until performance improves.

Closing the Self-Improvement Loop

Previously, the strategy review system generated 24 recommendations ("halt ranging trading", "blacklist GBP/USD") but none were actually implemented. The system was read-only.

  • Auto-cap underperformers — Daily 06:00 UTC job finds agent-instrument combos with 20+ evaluations and <30% accuracy, caps their weight to 0.3
  • Combo performance tracking — New agent_combo_performance table records which agent combinations produce winning trades
  • Performance benchmarks API/api/benchmarks endpoint tracks win rate, trades/day, weekly P&L against targets
Hardcoded Agent Count & Confidence Threshold
  • Position manager edge formula — Was hardcoded conviction / 8, now uses conviction / len(VOTING_AGENTS)
  • MIN_VOTE_CONFIDENCE — Raised from 0.25 to 0.35 to filter low-conviction agent noise
Edge Evaporation Killing Positions Instantly

Positions were being closed immediately after opening because the edge decay formula was too aggressive. Fixed the decay curve to use quadratic rather than linear decay.

Trade Drought & Instant Position Closure

Four critical issues were causing a trade drought and instant position closures. Fixed consensus edge calculation, position sizing, and stale thesis detection.

Pulse Base Confidence Too Low

Raised Pulse agent base confidence from 0.30 to 0.35 so moderate IG sentiment signals could pass the MIN_VOTE_CONFIDENCE filter and participate in consensus.

IG Sentiment Market IDs Corrected

Fixed incorrect IG API market IDs for sentiment data collection. Updated to use correct IG format (e.g., UK100, USTEC) instead of generic identifiers.