May 13, 20269 min readJeTech Lab

Agent Training Note: The Gap Between Synthetic Training and Real 5Y Gates

Why synthetic-data RL training and real five-year backtest gates diverge for AI trading models, including TradeFi, crypto expansion, and rolling Sharpe.

Agent Model

AI Trading

TradeFi

Crypto

Backtest

Reinforcement Learning

Registry Gate

Summary

Recent TradeFi agent training has produced fewer candidate models than expected. At first glance this can look like an algorithm issue across PPO, SAC, CrossQ, or model backbones. The stronger diagnosis is different: the training objective and the candidate registry gate are not aligned tightly enough.

Training improves risk-adjusted reward inside synthetic episodes. Candidate registration, however, requires the model to pass 30D rolling avg Sharpe >= 1.0 on a continuous five-year real-market daily backtest. This gate rewards consistency across a long real period, not only a few strong windows.

What We Observed

As of May 13, 2026 KST, the VM training results show repeated patterns.

Observation	Interpretation
Synthetic Sharpe and Calmar can be high while the real gate is low	Good behavior in the synthetic task does not fully transfer to the real five-year gate
Many results have an exact gate value of `0.0`	The model often converges to a flat policy with almost no position
Some runs peak early after only a few thousand steps, then degrade	Selection is catching short stochastic real-eval peaks rather than stable learning
Performance differs strongly by symbol	SPY/QQQ/XAU behave differently from country ETFs, metals, and energy contracts

Examples make the mismatch clear.

`XAGUSDT / ppo_tcn_v1`	Sharpe `4.81`, Calmar `12.15`	`0.912`
`COPPERUSDT / ppo_itransformer_v1`	Sharpe `10.97`, Calmar `10.78`	`0.882`
`SPYUSDT / ppo_itransformer_v1`	Calmar `4.11`	`0.139`

Field	Current gate
Evaluation data	Five years of real daily market data
Evaluation mode	Continuous five-year strategy return series
Main metric	Average of 30-day rolling Sharpe
Pass condition	`5Y 30D Avg Sharpe >= 1.0`
TradeFi annualization	`252` periods/year

Change	Purpose
Added exposure diagnostics	Record whether the model actually takes positions
Added flat real-eval early stop	Stop runs that remain flat across consecutive real-eval rounds
Added a TradeFi flat-position penalty	Make zero exposure during meaningful market moves non-free
Expanded gate metadata	Make registry failures easier to explain in JeTech Lab

Metric	Meaning
`avg_abs_realized_exposure`	Average realized position strength
`flat_realized_exposure_ratio`	Share of bars where realized exposure was near zero
`mean_turnover`	Average position change
`total_turnover`	Total position change
`nonzero_strategy_return_ratio`	Share of bars with non-zero strategy return

Workstream	Description
Stronger real-window validation	Use real historical windows more directly during training-time selection
Symbol-specific presets	Separate deadbands and reward scales for SPY/QQQ, EWY, metals, and energy
Multi-seed retries	Compare multiple stochastic runs instead of judging a symbol from one seed
Supervised warm start	Imitate simple baselines such as momentum or volatility targeting before RL fine-tuning
Gate-aligned reward	Bring 30-day rolling Sharpe and exposure quality closer to the reward itself

JeTech Lab

Agent Training Note: The Gap Between Synthetic Training and Real 5Y Gates

Summary

What We Observed

Problem 1: Objective-Gate Mismatch

Problem 2: Flat Policy Is Too Easy

Problem 3: Synthetic Data Is Not the Same as TradeFi Reality

What We Changed

Next Improvements

What This Means for Subscribers