Skip to content

Quickstart — pip install to verdict in 5 minutes

1. Install + sign up

pip install sablier-flow

# Optional engine extras
pip install 'sablier-flow[adapters-backtrader]'
pip install 'sablier-flow[adapters-vectorbt]'

Sign up at sablier.ai (email/password or Google OAuth — both work; verify your email if you used password). New accounts get free credits, enough to run the full loop against the bundled demo dataset.

2. Authenticate

import sablier_flow as sf
sf.login()        # opens browser, prompts Authorize, writes ~/.sablier/credentials

For CI / containers: export SABLIER_FLOW_API_KEY=sk_live_... and skip sf.login().

3. The end-to-end loop

import numpy as np
import sablier_flow as sf

df              = sf.demo_data()                       # SPY/QQQ/IWM/TLT + macro, 2010-2023
backtest_window = df.iloc[-252:]                       # the slice your strategy will evaluate

def my_backtest(prices):                               # YOUR backtest, unchanged
    rets = prices['SPY'].pct_change().dropna()
    return {'sharpe': float(rets.mean() / rets.std() * np.sqrt(252)) if rets.std() > 0 else 0.0}

fit     = sf.fit(df, features=list(df.columns), data_types=df.attrs['data_types'], horizon=252)
report  = sf.validate(fit.model_id)                    # cheap OOS structural check
paths   = sf.generate(fit.model_id, n_paths=1000, like=backtest_window)
verdict = sf.robustness(
    my_backtest(backtest_window),
    [my_backtest(d) for d in paths.as_dataframes()],
    primary_metric='sharpe',
)
print(verdict.summary())

Symmetric window. my_backtest(backtest_window) and each my_backtest(d) must evaluate the same window length. Comparing real Sharpe on full history against synth Sharpes on a 252-bar slice is asymmetric and mechanically produces 'highly_overfit'.

4. Reading the verdict

print(verdict.verdict)                   # 'robust' | 'borderline' | 'overfit' | 'highly_overfit'
print(verdict.overfit_score)             # 0.04 = real beat only 4% of alt-histories
print(verdict.synthetic_p5, verdict.synthetic_p95)
print(report.overall, report.memorization_risk)
Band overfit_score Meaning
robust < 0.70 No overfit signal. Read the Sharpe sign separately — robust ≠ profitable.
borderline 0.70 – 0.85 Real in top quartile of synth; defensible but not unambiguous.
overfit 0.85 – 0.95 Likely curve-fit.
highly_overfit > 0.95 Don't deploy without OOS re-validation.

If report.memorization_risk == 'high', the verdict above isn't reliable — see SDK reference.

5. Forward forecasting

Same generator, anchored at "today":

forward = sf.generate(fit.model_id, n_paths=1000, horizon=60, anchor_data=df.iloc[-200:])
forward_sharpes = np.array([my_backtest(d)['sharpe'] for d in forward.as_dataframes()])
print(f"median: {np.median(forward_sharpes):+.2f}, 90% CI: "
      f"[{np.percentile(forward_sharpes, 5):+.2f}, {np.percentile(forward_sharpes, 95):+.2f}]")

To validate that synth rankings predict real rankings across a strategy family, use sf.predictive_rank_score(real_sharpes, synth_sharpes) — see notebook 02.

6. Async + cross-process

handle = sf.fit_async(df, features=list(df.columns), data_types=df.attrs['data_types'], horizon=252)
# ... walk away, restart kernel, whatever
fit = sf.fetch_result(handle)         # blocks until done
sf.list_jobs(status='running')

handle.to_dict() / JobHandle.from_dict(...) persist across processes. Treat the handle as a bearer secret.

Next

  • Examples — full tutorial + three value-proof notebooks
  • SDK reference — every method, kwarg, return type