sablier-flow SDK — Full Reference¶

Canonical reference for the sablier-flow Python SDK. Type signatures and code examples are copy-paste runnable against the current PyPI release (sablier_flow.__version__). If a behavior or API you expect is not documented here, it does not exist in the SDK yet — don't hallucinate features.

What sablier-flow does¶

The customer has a backtest function f(prices) -> {"sharpe": ...}. They run it on their real history and want to know whether the result is genuine signal or overfit to the specific realization their data took. sablier-flow answers that by:

Training a generative model on the customer's history on a remote GPU worker (see Security posture for the current and target deployment specifics).
Generating N synthetic alternative versions of the same history — different paths, same statistical fingerprint.
Running the customer's backtest on every synthetic alt-history.
Comparing the real result to the distribution of synthetic results.

If the real result sits at the extreme tail of the synthetic distribution, the strategy is exploiting realization-specific noise — overfit. If it sits in the bulk, the strategy is robust.

Two additional outputs surface for serious quants: - Deflated Sharpe Ratio (DSR) under two nulls (empirical synthetic-best-of-N + analytical Bailey–LdP IID-Gaussian). - Probability of Backtest Overfitting (PBO) via Combinatorially Symmetric Cross-Validation on the real history alone.

Contents¶

Installation
Authentication
Security posture today (alpha)
The workflow: fit → generate → validate
Forward generation
Strategy families
Interpreting the output
Demo datasets
Adapters + model management
Full API reference
Common errors
Versioning

Installation¶

pip install sablier-flow                            # thin client only (~30 MB)
pip install 'sablier-flow[adapters-backtrader]'     # + backtrader integration
pip install 'sablier-flow[adapters-vectorbt]'       # + vectorbt integration

Python 3.10 or newer.

Authentication¶

Sign up at https://sablier.ai (email/password or Google OAuth — both work; verify your email if you used password). Then authenticate one of three ways, resolved in this order:

Explicit kwarg — sf.fit(real, api_key="sk_live_...") or sf.Client(api_key=...). Always wins.
SABLIER_FLOW_API_KEY env var — set in CI, containers, headless scripts.
~/.sablier/credentials file — written by sf.login() for interactive use.

import sablier_flow as sf
sf.login()              # opens browser, prompts Authorize, writes ~/.sablier/credentials (mode 0600)
sf.Client()             # auto-picks the stored credential from then on
sf.logout()             # drops the local credential (does NOT revoke server-side; use dashboard)

For non-interactive use: export SABLIER_FLOW_API_KEY=sk_live_... and skip sf.login() entirely.

Security posture today (alpha)¶

TLS 1.3 in transit, KMS-encrypted at rest, one-shot per-job symmetric keys, image-digest pinning on every request. Hardware memory encryption (AMD SEV-SNP + NVIDIA H100 CC mode) is on the roadmap — until that ships, plaintext customer data exists in worker RAM during the minutes-long training job

The workflow: `fit` → `generate` → `validate`¶

The SDK splits the lifecycle into three explicit calls so you train once and reuse the trained model across as many windows / strategies as you want.

# 1. Train once (~minutes, scales with data size). The server splits 80/20 with
#    a 21-bar embargo by default and keeps the held-out OOS slice encrypted
#    alongside the model so sf.validate(model_id) picks it up automatically.
fit = sf.fit(
    real,
    features=list(real.columns),         # all columns are co-generated jointly
    data_types=real.attrs["data_types"], # per-column transform annotation
    horizon=252,                         # training-window length (bars, not days)
    train_split=0.8,                     # 80% train, 20% OOS held out for validate()
    embargo_days=21,                     # bar gap between train end + OOS start
    seed=42,
)
print(fit.model_id)                 # opaque handle; pass to generate / validate / get_model
print(fit.training_loss, fit.loss_source)
# loss_source ∈ {'validation', 'training_proxy'} — the latter means the inner
# val split was too small to form a single (obs_length + horizon) window, so
# the loss reported is the training-loss proxy; the real OOS check still
# happens via sf.validate(...) on the persisted holdout.

# 2. Validate the model on the held-out OOS slice (zero-config — no holdout
#    DataFrame argument needed). Returns a ValidationReport with `overall`,
#    `memorization_risk`, and ~20 per-metric entries.
report = sf.validate(fit.model_id)

# 3. Generate N synthetic paths shaped like any window you want. `like=df`
#    derives length + index + price anchor from the window — synth paths
#    overlay your real series directly.
paths = sf.generate(fit.model_id, n_paths=1000, like=backtest_window,
                    data_types=real.attrs["data_types"], seed=42)

Schema contract — what `real_data` must look like¶

Field	Requirement
`df.index`	`pd.DatetimeIndex`, monotonic increasing, no duplicates (tz-naive or tz-aware)
`df.columns`	numeric dtype on every column listed in `features=`; NaNs masked, columns with post-ffill NaN fraction > 0.7 rejected
`data_types=` (kwarg)	required dict mapping every `features=` column to one of `{'price', 'level', 'return'}`. Bundled demos attach the canonical map on `df.attrs['data_types']`.
Row cadence	auto-detected from median Δt; any uniform cadence accepted (daily, intraday, weekly, monthly, quarterly); irregular indices raise
Length	≥ 200 rows on `fit`; shorter slices allowed for `like=` / `anchor_data=` / `holdout_data=`

Async path¶

Every sync method has an async sibling returning a JobHandle (carries job_id, kind, and one-shot result key). The handle survives process restarts via handle.to_dict() / JobHandle.from_dict(...). Treat it as a bearer secret.

handle = sf.fit_async(real, features=list(real.columns), data_types=real.attrs["data_types"], horizon=252)
result = sf.fetch_result(handle)        # blocks until done; FitResult/GenerationResult/ValidationReport by kind
sf.list_jobs(status="running"); sf.cancel_job(handle)

Catalog models — generate without fitting¶

Sablier hosts pre-trained catalog models for common universes. They skip fit entirely: browse with sf.catalog(), then generate against a recent window of the model's features.

# Browse the pre-trained, shared catalog models.
for m in sf.catalog():
    print(m.model_id, m.display_name, m.scorecard)   # scorecard: finval scores keyed by horizon

cat = sf.catalog()[0]
print(cat.feature_data_types)     # the input schema: {feature -> 'price'|'level'|'return'}

# Bring a recent window of exactly those features. data_types is auto-filled
# from the model's registered schema, so you don't pass it.
recent = my_prices[list(cat.feature_data_types)].iloc[-252:]
paths  = sf.generate(cat.model_id, anchor_data=recent, n_paths=200, horizon=252)

Catalog models surface four extra fields on the Model object (also on get_model / list_models): visibility ('catalog' vs 'private'), display_name, feature_data_types (input schema), and scorecard (finval validation scores per horizon). sf.catalog() is just list_models() filtered to visibility == 'catalog'.

Forward generation — deployment forecasting¶

Same generator, different anchor: instead of paralleling a past window, project forward from your most recent bar.

Use case	Call	Anchor
Alt-history (overfit audit)	`sf.generate(model_id, like=backtest_window)`	`like.iloc[0]`
Forward forecast (deployment)	`sf.generate(model_id, horizon=N, anchor_data=real.iloc[-200:])`	`anchor_data.iloc[-1]` ("today")

forward = sf.generate(fit.model_id, n_paths=1000, horizon=60,
                      anchor_data=real.iloc[-200:],
                      data_types=real.attrs["data_types"])
forward_sharpes = np.array([my_backtest(df)["sharpe"] for df in forward.as_dataframes()])
print(f"median: {np.median(forward_sharpes):+.2f}, 90% CI: "
      f"[{np.percentile(forward_sharpes, 5):+.2f}, {np.percentile(forward_sharpes, 95):+.2f}]")

sf.predictive_rank_score(real_sharpes, synth_sharpes) returns a Spearman ρ + bootstrap CI + verdict (well_calibrated / weakly_calibrated / uncalibrated / inverted) testing whether your strategy ranking on synth forwards predicts the ranking on real OOS data. See notebook 02 for the worked example.

Strategy families¶

For multiple strategy variants tested simultaneously, sf.evaluate_family(strategies_dict, real, n_paths=100) runs every strategy on every synthetic path and returns family-best DSR + CSCV PBO. See notebook 01 for the worked example. Standalone PBO is sf.probability_of_backtest_overfitting(strategies, real).

Interpreting the output¶

`RobustnessReport.verdict`¶

Bucket	`overfit_score`	Meaning
`robust`	`[0.00, 0.70)`	Real result is consistent with the synthetic distribution. No overfit signal.
`borderline`	`[0.70, 0.85)`	Real result is in the top quartile of synth.
`overfit`	`[0.85, 0.95)`	Real result exceeds 85%+ of synthetic alt-histories.
`highly_overfit`	`[0.95, 1.00]`	Real result is in the top 5%.

For higher-is-better metrics, overfit_score = mean(synthetic < real). robust is orthogonal to profitable — read the Sharpe sign separately. RobustnessReport.summary() returns a plain-English sentence including any structural / memorization warnings.

Deflated Sharpe Ratio¶

dsr = verdict.deflated_sharpe(strategy_returns=daily_returns, n_trials=1)
dsr.realistic, dsr.analytical          # DSR under realistic (regime-aware) + Bailey-LdP IID-Gaussian nulls
dsr.expected_max_sr_realistic, dsr.expected_max_sr_analytical
dsr.threshold_sr_realistic, dsr.threshold_sr_analytical

For a family of M strategies, pass n_trials=M (or use evaluate_family which handles it).

PBO (Probability of Backtest Overfitting)¶

Computed on real history alone via Combinatorially Symmetric Cross-Validation (Bailey et al. 2015).

`pbo`	Interpretation
≤ 0.2	Grid search has signal.
~ 0.5	No signal — parameter selection is noise.
≥ 0.6	Systematic overfitting.

Memorization risk¶

`memorization_nn_distance_ratio`	`memorization_risk`	Action
`> 0.80`	`low`	Synth distributed through the training manifold.
`[0.50, 0.80]`	`medium`	Cross-check against `coverage_*` metrics.
`< 0.50`	`high`	Model is regurgitating. Don't trust the overfit verdict on top.

Structural validation¶

ValidationReport.overall ∈ {'pass', 'warn', 'fail'} aggregates ~20 per-metric scores grouped into calibration, distribution, dependence, temporal, and extreme categories. The full per-metric breakdown is on ValidationReport.metrics; the underlying metric suite + thresholds are documented at github.com/sablier-ai/finval.

Demo datasets¶

sf.demo_data()                                       # default: us_equities_macro_2010_2023
sf.demo_data("us_equities_2010_2023")                # SPY/QQQ/IWM/TLT only, no macros
sf.demo_data("us_equities_macro_5min_3mo")           # 5-min intraday — 7 tickers, 3 months
sf.available_demo_datasets()                         # list all bundled names

Bundled parquets ship inside the wheel (pip install sablier-flow includes them). Zero network access required to load.

Adapters + model management¶

Engine adapters live under sablier_flow.adapters — as_dataframes, as_array, as_backtrader_feeds (extra: [adapters-backtrader]), as_vectorbt_panel (extra: [adapters-vectorbt]), write_lean_csv_universe. See the getting-started notebook for a worked example.

Fitted models persist server-side for ~30 days. Manage via sf.list_models(), sf.get_model(model_id), sf.delete_model(model_id). Full signatures below.

Full API reference¶

Every signature is verbatim from the source.

`Client`¶

sf.Client(
    api_key: str,
    *,
    endpoint: str | None = None,            # falls back to "https://flow.sablier.ai/v1"
    pinned_image_digest: str | None = None,
    attestation_mode: str = "production",   # "production" | "fake-for-dev"
    transport: Transport | None = None,     # for tests / in-process simulation
    timeout_s: float = 60.0,
    poll_interval_s: float = 2.0,
    poll_timeout_s: float = 30 * 60,
    verify: bool | str | None = None,       # None/True = system CA; False = skip; str = pin PEM
    cache_dir: str | os.PathLike | bool | None = None,
)

Methods:

Client.fit(
    real_data: pd.DataFrame,
    *,
    data_types: dict[str, str],               # REQUIRED — per-column annotation: 'price' | 'level' | 'return'
    features: Sequence[str] | None = None,    # default: every numeric column of real_data
    horizon: int | None = None,
    train_split: float | None = 0.8,          # set to None to skip the OOS split
    embargo_days: int = 21,
    seed: int | None = None,
    quiet: bool = False,                      # suppress the stderr cost-estimate / actual-cost lines
    idempotency_key: str | None = None,
) -> FitResult

Client.generate(
    model_id: str,
    *,
    n_paths: int = 1000,
    horizon: int | None = None,               # any length; defaults to training horizon
    anchor_data: pd.DataFrame | None = None,  # None → use server-stored training tail
    like: pd.DataFrame | None = None,         # convenience: derive horizon + index + anchor from this window
    data_types: dict[str, str] | None = None, # required when `like=` or `anchor_data=` is set (carries fresh data)
    seed: int | None = None,
    quiet: bool = False,
    idempotency_key: str | None = None,
) -> GenerationResult

Client.validate(
    model_id: str,
    *,
    holdout_data: pd.DataFrame | None = None, # None → use the OOS slice persisted at fit time
    data_types: dict[str, str] | None = None, # required when holdout_data is set
    n_paths: int = 500,
    seed: int | None = None,
    quiet: bool = False,
    idempotency_key: str | None = None,
) -> ValidationReport

Client.list_models(*, limit: int = 50) -> list[Model]
Client.get_model(model_id: str) -> Model
Client.delete_model(model_id: str) -> None

Module-level shortcuts¶

Core workflow:

sf.fit(real_data, *, api_key=None,
       features=None, data_types,                     # data_types REQUIRED
       horizon=None,
       train_split=0.8, embargo_days=21, seed=None,
       idempotency_key=None,
       # connection-shape kwargs (env-var fallback) ───────────────
       endpoint=None, pinned_image_digest=None,
       attestation_mode="production", verify=None,
       cache_dir=None, profile="default") -> FitResult

sf.generate(model_id, *, api_key=None,
            data_types,                                # data_types REQUIRED
            n_paths=1000, horizon=None,
            anchor_data=None, like=None, seed=None,
            idempotency_key=None,
            endpoint=None, pinned_image_digest=None,
            attestation_mode="production", verify=None,
            cache_dir=None, profile="default") -> GenerationResult

sf.validate(model_id, *, api_key=None,
            data_types=None,                           # REQUIRED only when holdout_data is supplied
            holdout_data=None, n_paths=500, seed=None,
            idempotency_key=None,
            endpoint=None, pinned_image_digest=None,
            attestation_mode="production", verify=None,
            cache_dir=None, profile="default") -> ValidationReport

data_types is a dict[str, str] mapping every column in features= to one of {'price', 'level', 'return'}. Missing the kwarg raises TypeError with the allowed-set message; an unknown value raises ValueError. Demo DataFrames attach the canonical map on df.attrs['data_types']. On sf.validate(model_id) without holdout_data the server reuses the data_types registered at fit time — passing the kwarg in that mode is a no-op and is silently ignored.

Async workflow:

sf.fit_async(real_data, ...)        -> JobHandle      # same kwargs as sf.fit
sf.generate_async(model_id, ...)    -> JobHandle      # same kwargs as sf.generate
sf.validate_async(model_id, ...)    -> JobHandle      # same kwargs as sf.validate
sf.fetch_result(handle)             -> FitResult | GenerationResult | ValidationReport
sf.list_jobs(*, status=None, limit=50, api_key=None, **kw) -> list[JobSummary]
sf.cancel_job(handle_or_id, *, api_key=None, **kw)         -> None

Predictive validity (post-hoc analytic; pure numpy + scipy, no path generation):

sf.predictive_rank_score(real_results, synth_results, *,
                         primary_metric=None,
                         n_bootstrap=10000,
                         seed=0) -> PredictiveRankReport

Auth + credentials:

sf.login(*, endpoint=None, profile="default",
         open_browser=True, poll_timeout_s=600.0,
         verify=None) -> LoginResult
sf.logout(*, profile="default") -> bool                 # True if a profile was dropped

Model management:

sf.list_models(*, limit=50, api_key=None, **kw) -> list[Model]
sf.get_model(model_id, *, api_key=None, **kw)   -> Model
sf.delete_model(model_id, *, api_key=None, **kw) -> None

Account / pre-flight:

sf.ping(*, api_key=None, **kw)         -> dict[str, Any]
sf.whoami(*, api_key=None, **kw)       -> dict[str, Any]
sf.credits(*, api_key=None, **kw)      -> CreditsBalance      # Pydantic — use attribute access (balance.available, .monthly_used, ...)
sf.usage(*, since=None, until=None, kind=None, limit=100, api_key=None, **kw) -> list[UsageEvent]
sf.usage_summary(*, period="month", api_key=None, **kw) -> UsageSummary   # Pydantic — summary.total_credits, .by_kind, ...
sf.estimate_cost(kind, *, real_data=None, features=None, horizon=None, n_paths=None, n_features=None, n_rows=None, api_key=None, **kw) -> dict[str, Any]
    # Returns {estimated_credits, low, high, notes}. Wall-clock duration is NOT returned.
    # `kind` must be one of 'fit' | 'generate' | 'validate' — 'train' is rejected.

Local helpers (no network):

sf.validate_data(real_data) -> None      # raise on schema violations BEFORE the network round-trip
sf.demo_data(name="us_equities_macro_2010_2023") -> pd.DataFrame
sf.available_demo_datasets() -> list[str]

Each shortcut constructs a one-shot Client. Connection-shape settings (endpoint, verify, pinned_image_digest, attestation_mode) fall back to env vars SABLIER_FLOW_ENDPOINT, SABLIER_FLOW_CERT, SABLIER_FLOW_PINNED_IMAGE_DIGEST, SABLIER_FLOW_ATTESTATION_MODE. api_key falls back to SABLIER_FLOW_API_KEY, then to ~/.sablier/credentials (written by sf.login()).

Unknown kwargs raise TypeError with the offending name (no **kwargs swallow), so IDE autocomplete and inspect.signature() see the real parameter list.

`JobHandle`¶

Returned by sf.fit_async / sf.generate_async / sf.validate_async. Persistable across processes via to_dict() / from_dict(d).

@dataclass(frozen=True)
class JobHandle:
    job_id: str
    kind: str                # 'fit' | 'generate' | 'validate'
    result_key_b64: str      # standard-base64 of the AES-256-GCM key — treat as a secret

    def to_dict(self) -> dict[str, str]: ...
    @classmethod
    def from_dict(cls, d: dict[str, str]) -> "JobHandle": ...

Pair with sf.fetch_result(handle) to block on completion and materialize the typed result.

`FitResult`¶

@dataclass(frozen=True)
class FitResult:
    model_id: str
    features: list[str]
    training_horizon: int
    training_end_date: str | None
    sdk_version: str
    expires_at: str | None = None
    training_loss: float | None = None
    loss_source: str | None = None             # 'validation' | 'training_proxy'
    training_start_date: str | None = None
    holdout_start_date: str | None = None
    holdout_end_date: str | None = None

`Model`¶

@dataclass(frozen=True)
class Model:
    model_id: str
    features: list[str]
    training_horizon: int
    n_assets: int
    status: str                                # 'ready' | 'failed' | 'expired'
    training_start_date: str | None = None
    training_end_date: str | None = None
    holdout_start_date: str | None = None
    holdout_end_date: str | None = None
    train_split: float | None = None
    embargo_days: int | None = None
    sdk_version: str | None = None
    training_loss: float | None = None
    created_at: str | None = None
    last_used_at: str | None = None
    expires_at: str | None = None

`GenerationResult`¶

@dataclass(frozen=True)
class GenerationResult:
    paths_returns: np.ndarray                  # (n_paths, horizon, n_features), z-scored
    paths_prices: np.ndarray                   # (n_paths, horizon, n_features), price-level
    feature_names: list[str]                   # original input columns only
    last_prices: np.ndarray
    horizon: int
    n_paths: int
    seed: int | None
    sdk_version: str
    memorization_risk: str | None              # 'low' | 'medium' | 'high'
    memorization_nn_distance_ratio: float | None
    paths_index: pd.DatetimeIndex | None       # set when generate was called with like=window

    def as_dataframes(self, index=None) -> list[pd.DataFrame]: ...

`ValidationReport`¶

@dataclass(frozen=True)
class ValidationReport:
    overall: str                               # 'pass' | 'warn' | 'fail'
    metrics: dict[str, Any]                    # per-metric breakdown (see "Structural validation")
    memorization_risk: str | None              # 'low' | 'medium' | 'high'
    memorization_nn_distance_ratio: float | None
    n_paths_used: int | None
    holdout: bool = False                      # True when validated against a held-out OOS slice

`robustness`¶

sf.robustness(
    real_result: float | dict[str, float],
    synthetic_results: Sequence[float | dict[str, float]],
    *,
    primary_metric: str | None = None,
    higher_is_better: bool = True,
) -> RobustnessReport

`RobustnessReport`¶

@dataclass(frozen=True)
class RobustnessReport:
    overfit_score: float
    verdict: Literal["robust", "borderline", "overfit", "highly_overfit"]
    primary_metric: str
    real_value: float
    synthetic_mean: float
    synthetic_median: float
    synthetic_std: float
    synthetic_min: float
    synthetic_max: float
    synthetic_p5: float
    synthetic_p25: float
    synthetic_p75: float
    synthetic_p95: float
    synthetic_ci_95: tuple[float, float]
    n_synthetic: int
    per_metric: dict[str, dict[str, float]]
    notes: list[str]
    synthetic_values: tuple[float, ...]        # raw per-path values
    higher_is_better: bool

    @property
    def acceptable(self) -> bool: ...          # True for 'robust' or 'borderline'
    def summary(self) -> str: ...
    def deflated_sharpe(self, *, strategy_returns=None, n_trials=1) -> DeflatedSharpeReport: ...
    def to_html(self, path=None, *, title="Robustness Report") -> str: ...

`deflated_sharpe`¶

sf.deflated_sharpe(
    *,
    observed_sr: float,
    synthetic_sharpes: Sequence[float] | np.ndarray,
    strategy_returns: Sequence[float] | np.ndarray | None = None,
    n_trials: int = 1,
    significance_level: float = 0.95,
) -> DeflatedSharpeReport

`DeflatedSharpeReport`¶

@dataclass(frozen=True)
class DeflatedSharpeReport:
    observed_sr: float
    n_trials: int
    realistic: float                           # DSR under Sablier synthetic-best-of-N null
    analytical: float                          # DSR under Bailey-LdP IID-Gaussian null
    expected_max_sr_realistic: float
    expected_max_sr_analytical: float
    threshold_sr_realistic: float              # SR needed for DSR=0.95 (realistic)
    threshold_sr_analytical: float             # ... (analytical)

    def to_dict(self) -> dict: ...

`evaluate_family`¶

sf.evaluate_family(
    strategies: Mapping[str, Callable[..., float | dict]],
    real_data: pd.DataFrame,
    *,
    n_paths: int = 100,
    primary_metric: str | None = None,
    higher_is_better: bool = True,
    pbo_cscv_splits: int = 16,                 # SDK floor; lower values under-detect overfit
    executor: Literal["serial", "thread"] = "serial",
    max_workers: int | None = None,
    progress: bool = False,
    raise_on_failure: bool = False,
    **fit_or_generate_kwargs,                  # features, horizon, seed, train_split, etc.
) -> FamilyReport

`FamilyReport`¶

@dataclass(frozen=True)
class FamilyReport:
    strategy_names: tuple[str, ...]
    primary_metric: str
    real_metrics: tuple[dict[str, float], ...]
    synthetic_metrics: tuple[tuple[dict[str, float], ...], ...]
    real_max_value: float
    real_argmax_strategy: str
    synthetic_max_values: np.ndarray           # (n_paths,) best-of-N per path
    deflated_sharpe: DeflatedSharpeReport
    pbo: float
    pbo_n_partitions: int
    pbo_cscv_splits: int
    n_paths: int
    per_strategy_real_metric: dict[str, float]
    per_strategy_overfit_score: dict[str, float]
    per_strategy_synthetic_median: dict[str, float]
    failures: tuple[str, ...]
    notes: tuple[str, ...]

    def summary(self) -> str: ...
    def most_overfit_variants(self, *, top: int = 5) -> list[tuple[str, float]]: ...
    def to_dict(self) -> dict: ...

`probability_of_backtest_overfitting`¶

sf.probability_of_backtest_overfitting(
    strategies: Mapping[str, Callable[..., float | dict]],
    real_data: pd.DataFrame,
    *,
    primary_metric: str = "sharpe",
    higher_is_better: bool = True,
    cscv_splits: int = 16,
    executor: Literal["serial", "thread"] = "serial",
    max_workers: int | None = None,
) -> tuple[float, int]                          # (pbo_value, n_partitions_used)

`PredictiveRankReport`¶

@dataclass(frozen=True)
class PredictiveRankReport:
    spearman_rho: float                       # rank correlation real vs synth-forward
    p_value: float                            # scipy.stats.spearmanr two-sided p
    ci_95: tuple[float, float]                # bootstrap percentile CI (10000 resamples)
    n_strategies: int                         # intersection of {real_results, synth_results}
    mean_abs_metric_gap: float                # magnitude bias (rank can be right while abs is biased)
    primary_metric: str                       # 'sharpe' if dicts; 'value' if scalars
    real_values: dict[str, float]
    synth_values: dict[str, float]
    n_bootstrap: int
    notes: list[str]

    @property
    def verdict(self) -> Literal[
        "well_calibrated", "weakly_calibrated", "uncalibrated", "inverted",
    ]: ...
    @property
    def acceptable(self) -> bool: ...         # True if verdict ∈ {well_calibrated, weakly_calibrated}
    def summary(self) -> str: ...
    def to_dict(self) -> dict: ...

Lower-level primitives (sf.AttestationVerifier, sf.envelope_encrypt, sf.envelope_decrypt, AttestationQuote, EnvelopeEncrypted) are exported for custom-transport implementers. Client invokes them internally on every request — the standard workflow never touches them.

Common errors¶

Error	Cause	Fix
`ValueError: api_key is required`	`SABLIER_FLOW_API_KEY` not set and not passed as kwarg	`export SABLIER_FLOW_API_KEY=sk_live_...`
`ValueError: real_data.index must be a pd.DatetimeIndex`	DataFrame index is not date-like	`df.index = pd.to_datetime(df.index)`
`ValueError: real_data has N rows; need at least 200`	Training data too short	Use a longer history
`ValueError: real_data has non-numeric columns: [...]`	Non-numeric column present	Drop / convert before passing
`ValueError: synthetic_results is empty`	`robustness()` called with no synth results	Generate paths first via `sf.generate(...)`
`ValueError: deflated_sharpe is only defined for higher-is-better metrics`	Called `.deflated_sharpe()` on a drawdown-style report	DSR is for return-style metrics. Use the raw `overfit_score` for lower-is-better.
`TransportError: retry budget exhausted`	Network blip during status polling exceeded retry window	The worker may still be training — call `sf.list_models()` to recover the model_id and resume with `sf.validate(model_id)`
`AttestationVerificationError`	Image digest mismatch between SDK and TEE	The SDK release pins a TEE image digest; if they disagree either upgrade the SDK or wait for the matching server rollout
`RemoteJobError`	Worker failed inside the TEE	Re-raise carries the safe error message from the worker; inspect and retry if transient

Versioning¶

The public API follows semantic versioning. Major releases (X.0.0) may introduce breaking changes; minor (X.Y.0) and patch (X.Y.Z) releases preserve backwards compatibility. The current version is exposed at sablier_flow.__version__.

See PyPI for the latest release, the CHANGELOG for the full history, and GitHub releases for per-release notes.

Pin behaviour you care about explicitly.

sablier-flow SDK — Full Reference¶

What sablier-flow does¶

Contents¶

Installation¶

Authentication¶

Security posture today (alpha)¶

The workflow: fit → generate → validate¶

Schema contract — what real_data must look like¶

Async path¶

Catalog models — generate without fitting¶

Forward generation — deployment forecasting¶

Strategy families¶

Interpreting the output¶

RobustnessReport.verdict¶

Deflated Sharpe Ratio¶

PBO (Probability of Backtest Overfitting)¶

Memorization risk¶

Structural validation¶

Demo datasets¶

Adapters + model management¶

Full API reference¶

Client¶

Module-level shortcuts¶

JobHandle¶

FitResult¶

Model¶

GenerationResult¶

ValidationReport¶

robustness¶

RobustnessReport¶

deflated_sharpe¶

DeflatedSharpeReport¶

evaluate_family¶

FamilyReport¶

probability_of_backtest_overfitting¶

PredictiveRankReport¶