Quantsentinel — Technical Deep-Dive

Published 28 May 2026 Reading 20 min Audience Technical evaluators · Engineering leads · CTOs

Why this document exists

The product blog answers “what does it do, who is it for, why does it matter.” This document answers “how is it built, why those specific choices, what would a CTO need to know to evaluate the platform technically.”

It is written for engineering-side evaluators on the receiving end of any acquisition or partnership conversation. The product blog is the conversation opener; this is the second meeting.


Service topology

Thirteen microservices, deployed via Docker Compose on a single Compute Engine VM (e2-standard-4) with the Postgres datastore on managed Cloud SQL (db-g1-small, Postgres 16 with TimescaleDB extension).

                 +-------------+
                 |    caddy    |  ← auto-HTTPS, basic_auth scoped to operator paths
                 +-------------+
                       |
       +---------------+---------------+
       |                               |
+-------------+                +----------------+
|  frontend   |                |    gateway     |  ← per-tenant routing, auth, CSRF
|  (Next 14)  |                |     (Go)       |
+-------------+                +----------------+
                                       |
            +--------------------+----+----+--------------------+
            |                    |         |                    |
    +----------------+   +-------------+   +----------------+   +-----+
    |     broker     |   |     ml      |   | intelligence   |   | ... |
    | (FastAPI)      |   | (FastAPI)   |   | (FastAPI)      |   |     |
    +----------------+   +-------------+   +----------------+   +-----+
            |                    |
            +---------+----------+
                      |
            +------------------+              +------------------+
            |   Cloud SQL      |              |   Redis (cache)  |
            |   Postgres 16 +  |              |                  |
            |   TimescaleDB    |              +------------------+
            +------------------+

Frontend (Next.js 14 App Router) — server-rendered React, per-tenant URL rewrites in middleware, signed qs_session cookie auth. Talks only to the gateway BFF, never directly to backend services.

Gateway (Go, custom) — per-tenant path rewriting (/t/<slug>/api/v1/.../api/v1/...), slug-to-tenant-id resolution with 5-min cache, auth middleware proxying to broker’s auth router, rate limiting per tenant (600 req/h on /pipeline/decide).

Broker (FastAPI) — owns the auth router (login, invite, password reset), the upstream execution-API integrations (OAuth-based), the order placement + reconciliation, and the order-history/positions/greeks/margins endpoints. The dry-run lock lives here.

ml (FastAPI) — the 4-layer alpha engine + 7-wall risk castle + 17 ML systems + the live signal dashboard + the playground orchestrator hooks. Long-running APScheduler for the weekly retrain crons.

intelligence (FastAPI) — option chain polling, IV surface computation, OI flow aggregation, GEX estimation, news ingestion + classification. Writes to TimescaleDB hypertables; ml reads from them.

tenant_adaptation (FastAPI) — synthetic-persona orchestrator + per-tenant behavioural ML models. Rule 2 isolation: this container is started with a stripped-down env block excluding all broker credentials. Refuses to boot if real credentials are visible.

Plus support services: backtest (historical replay), copilot (LLM-driven narrations), redis, the autoheal sidecar, and the editorial blog itself.


MLOps stack

ModelRegistry — versioned, auditable, hot-swappable

Every ML model in the platform — seventeen systems, plus internal helpers — goes through one shared ModelRegistry API.

# Save
ModelRegistry.get_instance().save_model(
    model=trained_model,
    metadata=ModelMetadata(
        type_id="strike_selection_model",
        framework="sklearn",
        hyperparameters={"n_estimators": 400, "max_depth": 6},
        train_metrics={"n_samples": 5_000},
        validation_metrics={"directional_accuracy": 0.62},
        training_data_version="2026-05-26",
        training_data_end=date(2026, 5, 26),
        retrain_cadence="weekly",
    ),
    adapter=SklearnAdapter(),
    auto_promote=True,
)

# Load (boot-time prewarm)
model = ModelRegistry.get_instance().load_production_model(
    type_id="strike_selection_model",
    tenant_id=None,                    # global; per-tenant variants supported
    adapter=SklearnAdapter(),
)

The registry persists three things per model version: the artifact (pickled bytes, SHA-256 verified), the metadata row (Postgres), and the production-pointer row (one per (type_id, tenant_id)). promote_atomic uses a SERIALIZABLE transaction to flip the pointer.

Adapters

SklearnAdapter — pickles the underlying estimator. Works for scikit-learn, XGBoost, LightGBM uniformly. JSONAdapter — for state-dict-style learners (anomaly detector, feature drift monitor, slippage coefficients). A third TorchAdapter exists for future deep-hedging usage but isn’t deployed yet.

Singletons + hot-swap

Each module that needs a fitted model declares a singleton accessor (e.g., get_trend_classifier(), get_strike_selector()). Boot-time prewarm pulls the production version from the registry into the singleton. The weekly retrain cron calls refresh_<thing>() after save, which re-pulls the new version into the same singleton — without restarting the running uvicorn process.

The pattern is identical across all 7 + 5 = 12 models:

def get_<thing>():
    with _lock:
        s = _cache.get("<thing>")
        if s is not None: return s
        s = <ThingModel>()
        _try_load_into(s)         # registry → singleton
        _cache["<thing>"] = s
        return s

def refresh_<thing>():
    with _lock:
        s = _cache.get("<thing>")
        if s is None: return False
        return _try_load_into(s)

This is the difference between “we have a model” and “we have a model that improves every week.” The weekly retrains happen during Sunday IST mornings, between 03:00 and 06:30, one model per slot.

Validation contracts

promote_to_production runs a validation contract before flipping the pointer. The contract is per-type_id and rejects promotions that fail any of:

A new candidate version that fails the contract stays staged. The old production version keeps serving.

Rollback

registry.rollback(type_id=..., tenant_id=...) reverts to the immediately-previous production version (or a specifically-named version). Auto-promote is bypassed for rollback so a known-old model can be served even if its metrics dropped below the current contract threshold.


The 4-layer alpha engine — implementation detail

Layer 1 — Alpha Discovery (alpha_discovery/)

A factor library + an LLM-driven discovery loop. The library starts with 30+ documented factors (volatility forecasting variants, IV-RV divergence, OI flow signals, cross-market signals, sentiment, GEX positioning). Each factor implements a contract:

class Factor(Protocol):
    name: str
    description: str
    def compute(self, snapshot: MarketSnapshot) -> float: ...
    def expected_decay(self) -> float: ...

The discovery loop runs weekly: it asks Gemini 2.5 Pro to propose new candidate factors from market literature, validates each candidate against historical OOS data, and promotes survivors to the production pool. Factors that decay below threshold are de-promoted.

Layer 2 — Signal Generation (signal_generation/)

Eight specialised modules. Each writes a probability distribution over a forward-looking horizon, with conformal-prediction confidence intervals attached.

The combiner is non-trivial: it doesn’t just average. It uses regime-conditional weighting (a signal that’s accurate in RANGE_LOW_IV may be useless in PANIC), drift-aware down-weighting (a signal whose feature distribution has shifted gets penalised), and an internal-consistency gate that refuses to publish when the modules disagree.

Layer 3 — Strategy Selection (strategy_selection/)

This is where the RL agents live. Population-based training (PBT) across 8 regime-specialised agents. Each agent’s policy is a contextual bandit over a fixed action space (the 8 structures from the spec). The PBT loop weekly cross-validates each agent on a held-out window of the prior 30 days; the agent with the best out-of-sample reward gets to drive that regime in production.

The agents inherit from a strict abstract base that forbids inventing new structures — they pick among validated templates only.

Layer 4 — Execution & Hedging (execution/)

Deep-hedging neural networks (trained on synthetic option-pricing data, learn portfolio-level hedge ratios). Smart execution with TWAP/VWAP scheduling. Cost-aware order routing (per-strike fee model). The permanent tail-hedge layer is maintained independently — a separate scheduler cron rolls 5-delta puts ~5 days before expiry.


Multi-tenancy enforcement

Multi-tenancy is the architectural choice that costs the most up-front and pays the most as scale grows. Quantsentinel was multi-tenant from day one. Here is what that looks like at each layer.

Database layer

Postgres role separation:

Row-level security policies on every tenant-scoped table:

CREATE POLICY tenant_isolation ON live_signals.signal_payloads
    FOR ALL TO qs_app
    USING (tenant_id = current_setting('app.tenant_id')::int);

The application calls SET LOCAL app.tenant_id = $1 at the start of each request transaction. RLS does the rest. A bug that forgets to set the tenant id results in zero rows returned — not cross-tenant leakage.

Triggers on the playground_synthetic.* tables raise exceptions on cross-tenant writes:

CREATE OR REPLACE FUNCTION enforce_tenant_id()
RETURNS trigger AS $$
BEGIN
  IF NEW.tenant_idx IS NULL THEN
    RAISE EXCEPTION 'tenant_idx required';
  END IF;
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

WebSocket layer

Live signal updates use Socket.IO rooms. Each tenant has their own room. Subscribe authorisation is checked on every subscribe call — a user can subscribe only to their own tenant’s room. A bug in the frontend that requests a different room results in a denial, not a leakage.

Audit layer

Every state-changing action goes through audit_log(tenant_id, user_id, event_type, payload). The audit table is append-only (no UPDATE or DELETE granted to qs_app). For SEBI compliance and any future regulator inspection, the trail is complete.

Notification layer

Telegram bot integration: each tenant’s telegram_chat_id is verified to be a private 1:1 chat (the bot refuses to send to group chats or channels). The verification is re-done at the start of each delivery to catch chat-id swaps.


Cost engine — Indian market specifics

The cost engine is small but does heavy lifting. It models:

def total_costs(
    *,
    structure: str, lots: int, mid_price: float,
    side: str,  # "BUY" | "SELL"
    spread_pts: float, volume_pct: float,
) -> CostBreakdown:
    # STT — different rates for buy vs sell, options vs futures
    stt = (
        0.00125 * notional if (side == "SELL" and is_option(structure)) else
        0.00025 * notional if (side == "BUY"  and is_option(structure)) else
        ...
    )

    # Exchange transaction charges — NSE rates
    etc = NSE_RATE_BPS[structure] * notional

    # GST at 18% on commission + ETC
    gst = 0.18 * (commission + etc)

    # Slippage from the calibrated model
    slippage = slippage_model.estimate_impact(
        lots=lots, mid=mid_price, spread=spread_pts,
        volume_pct=volume_pct,
    )

    # SEBI fees, stamp duty
    sebi = 0.0001 * notional
    stamp = STAMP_DUTY_RATES[side] * notional

    return CostBreakdown(stt, etc, gst, commission, slippage, sebi, stamp)

The cost engine refuses any trade where net_ev = expected_pnl - total_costs ≤ 0. This single check eliminates the majority of retail failure modes.


Synthetic persona orchestrator

The 20-persona fleet is the platform’s solution to the ML cold-start problem. Each persona has a YAML declaration:

# definitions/P05_aggressor.yaml
persona_id: P05
name: "The Aggressor"
initial_state:
  knob: 75
  capital: 1_500_000

knob_behavior:
  base_rule: "stay_at_current"
  triggers:
    - condition: "consecutive_wins >= 3"
      probability: 0.40
      action: "knob += 5"
      reason_code: "earned_confidence"
      is_biased: true

stochastic_noise:
  decision_randomness: 0.25

futures_behavior:
  enabled: true
  baseline_appetite: 0.80
  hedge_preference: "never"
  hedge_trigger_band: "MODERATE"
  max_overnight_lots: 6
  size_haircut_naked: 1.0
  size_haircut_hedged: 0.0

The orchestrator ticks all 20 personas through simulated trading days. Each persona’s behaviour reflects its declared archetype + stochastic noise + psychological-state transitions (rational ↔ tilted ↔ fatigued based on recent P&L). The resulting trade rows feed the ML training crons.

Rule 1 — the inviolable rule — is that no ML pipeline ever sees persona_id / is_synthetic / bias_type / persona_archetype. The training fetchers strip these fields before generating feature vectors. The synthetic-trade rows are tagged context_source = 'synthetic'; live-execution rows are context_source = 'real'. Training pipelines weight real higher than synthetic; as real volume grows, synthetic naturally decays in influence.


What we measure at runtime

Every component emits structured logs (JSON, parsed by the autoheal sidecar + a future ELK stack). The metrics we care about:

A grafana dashboard surfaces these to the operator. Tenants see their own subset via the live signal dashboard.


Deployment — concretely

docker compose up -d brings the whole stack up. Container build context lives in the repo root; the deploy directory has a docker-compose.yml and a Caddyfile.

A single commit on the live-signal-dashboard branch ships in approximately 5 minutes from git push to live. The pipeline:

  1. Developer pushes
  2. CI (currently lightweight — pytest + basic smoke) gates the commit
  3. Operator pulls the tarball to the VM, extracts, runs docker compose build + up -d --force-recreate <service>
  4. Autoheal verifies the new container reaches healthy within 60s, otherwise rolls back

Caddy auto-issues TLS certs on first request. The Cookie header carrying qs_session is the primary auth; HTTP basic auth is a fallback for direct-access operator paths.


Closing — why these decisions

Every architectural choice in Quantsentinel was made to optimise for one of three things:

  1. Capital preservation — the risk castle, the kill switches, the cost engine, the tail hedges. Every other concern is downstream of this.

  2. Reproducibility — model versioning, validation contracts, audit logs. A trade that fires has a complete provenance.

  3. Multi-tenant correctness — RLS, tenant-scoped audit, WS room isolation. Scaling to thousands of users is a database-throughput problem, not an architecture rewrite.

These are not the optimisations a hobby project makes. They are the ones an institutional system requires. The platform was built that way from the start.

For deeper conversations about any specific subsystem — the ML pipelines, the cost engine, the multi-tenant guarantees, the deployment topology — direct contact is the right next step. The product overview is at /platform; the executive brief is at /brief.