Research
May 13, 20269 min readJeTech Lab

Agent Training Note: The Gap Between Synthetic Training and Real 5Y Gates

Why synthetic-data RL training and real five-year backtest gates diverge for AI trading models, including TradeFi, crypto expansion, and rolling Sharpe.

Agent Model
AI Trading
TradeFi
Crypto
Backtest
Reinforcement Learning
Registry Gate

Summary

Recent TradeFi agent training has produced fewer candidate models than expected. At first glance this can look like an algorithm issue across PPO, SAC, CrossQ, or model backbones. The stronger diagnosis is different: the training objective and the candidate registry gate are not aligned tightly enough.

Training improves risk-adjusted reward inside synthetic episodes. Candidate registration, however, requires the model to pass 30D rolling avg Sharpe >= 1.0 on a continuous five-year real-market daily backtest. This gate rewards consistency across a long real period, not only a few strong windows.

What We Observed

As of May 13, 2026 KST, the VM training results show repeated patterns.

ObservationInterpretation
Synthetic Sharpe and Calmar can be high while the real gate is lowGood behavior in the synthetic task does not fully transfer to the real five-year gate
Many results have an exact gate value of 0.0The model often converges to a flat policy with almost no position
Some runs peak early after only a few thousand steps, then degradeSelection is catching short stochastic real-eval peaks rather than stable learning
Performance differs strongly by symbolSPY/QQQ/XAU behave differently from country ETFs, metals, and energy contracts

Examples make the mismatch clear.

RunSynthetic bestReal 5Y 30D Avg Sharpe
XAGUSDT / ppo_tcn_v1Sharpe 4.81, Calmar 12.150.912
COPPERUSDT / ppo_itransformer_v1Sharpe 10.97, Calmar 10.780.882
SPYUSDT / ppo_itransformer_v1Calmar 4.110.139

These numbers do not prove that the algorithms are useless. They show that the synthetic task can be solved in ways that do not satisfy the real registry gate.

Problem 1: Objective-Gate Mismatch

The current reward is built from per-bar return, transaction cost, drawdown, downside risk, and selected risk-adjusted terms. That is useful for teaching a policy to survive an episode and avoid large losses.

The registry gate is more specific.

FieldCurrent gate
Evaluation dataFive years of real daily market data
Evaluation modeContinuous five-year strategy return series
Main metricAverage of 30-day rolling Sharpe
Pass condition5Y 30D Avg Sharpe >= 1.0
TradeFi annualization252 periods/year

The model optimizes synthetic episode reward, but it must pass a continuous real-market rolling Sharpe gate. If those two are not aligned, high synthetic scores will not reliably become valid candidates.

Problem 2: Flat Policy Is Too Easy

If a trading agent holds no position, it earns no return, but it also avoids most transaction cost and drawdown penalties. With target exposure deadbands and rebalance deadbands, small actions are also collapsed to zero or to the existing position.

That makes a zero-position policy an attractive local optimum when the model is uncertain. On the VM, repeated 5Y 30D Avg Sharpe = 0.0 results are a strong sign that the model did not create meaningful exposure.

A flat policy may look safe, but it is not a useful operating candidate. A JeTech candidate model must produce a real signal while controlling risk.

Problem 3: Synthetic Data Is Not the Same as TradeFi Reality

For some TradeFi symbols, Binance listing history is not long enough for the maximum inference window, so the training pipeline uses yfinance-backed substitute symbols. That data has weekends, holidays, and exchange closures, unlike crypto calendars.

Even when synthetic episodes are calibrated from the recent five-year reference set and Temporal GAN outputs, they cannot fully reproduce each symbol's micro-regime. SPY, QQQ, EWY, gold, silver, copper, crude oil, and natural gas have different trend, gap, volatility clustering, and mean-reversion structures.

One shared reward function and one shared preset family will not fit every symbol equally well.

What We Changed

The goal of this change is not to inflate candidate pass rates. The goal is to identify failures more clearly, stop wasting compute on obvious flat policies, and move the TradeFi reward slightly closer to the real gate.

ChangePurpose
Added exposure diagnosticsRecord whether the model actually takes positions
Added flat real-eval early stopStop runs that remain flat across consecutive real-eval rounds
Added a TradeFi flat-position penaltyMake zero exposure during meaningful market moves non-free
Expanded gate metadataMake registry failures easier to explain in JeTech Lab

The new diagnostics include:

MetricMeaning
avg_abs_realized_exposureAverage realized position strength
flat_realized_exposure_ratioShare of bars where realized exposure was near zero
mean_turnoverAverage position change
total_turnoverTotal position change
nonzero_strategy_return_ratioShare of bars with non-zero strategy return

For TradeFi training, the script now applies a small flat-position penalty by default.

flat penalty = penalty_scale
  * max(abs(asset_return) - return_threshold, 0)
  * (1 - abs(realized_exposure))

The default values are penalty_scale=0.02 and return_threshold=0.001. This does not force the agent to trade every bar. It only prevents meaningful market moves with zero exposure from being completely free.

Next Improvements

This is the first defensive layer. The deeper improvements are:

WorkstreamDescription
Stronger real-window validationUse real historical windows more directly during training-time selection
Symbol-specific presetsSeparate deadbands and reward scales for SPY/QQQ, EWY, metals, and energy
Multi-seed retriesCompare multiple stochastic runs instead of judging a symbol from one seed
Supervised warm startImitate simple baselines such as momentum or volatility targeting before RL fine-tuning
Gate-aligned rewardBring 30-day rolling Sharpe and exposure quality closer to the reward itself

What This Means for Subscribers

A low candidate-generation rate does not mean JeTech Lab is accepting weak models. It mostly means the real five-year gate is filtering them out.

That said, a strict gate only works well when the training system is designed around it. This research note and the accompanying code changes are part of that alignment work. Future agent model updates will make it easier to review not only returns, but also exposure quality, flat ratio, and rolling Sharpe stability.