Quantsentinel — Technical Deep-Dive
Why this document exists
The product blog answers “what does it do, who is it for, why does it matter.” This document answers “how is it built, why those specific choices, what would a CTO need to know to evaluate the platform technically.”
It is written for engineering-side evaluators on the receiving end of any acquisition or partnership conversation. The product blog is the conversation opener; this is the second meeting.
Service topology
Thirteen microservices, deployed via Docker Compose on a single Compute Engine VM (e2-standard-4) with the Postgres datastore on managed Cloud SQL (db-g1-small, Postgres 16 with TimescaleDB extension).
+-------------+
| caddy | ← auto-HTTPS, basic_auth scoped to operator paths
+-------------+
|
+---------------+---------------+
| |
+-------------+ +----------------+
| frontend | | gateway | ← per-tenant routing, auth, CSRF
| (Next 14) | | (Go) |
+-------------+ +----------------+
|
+--------------------+----+----+--------------------+
| | | |
+----------------+ +-------------+ +----------------+ +-----+
| broker | | ml | | intelligence | | ... |
| (FastAPI) | | (FastAPI) | | (FastAPI) | | |
+----------------+ +-------------+ +----------------+ +-----+
| |
+---------+----------+
|
+------------------+ +------------------+
| Cloud SQL | | Redis (cache) |
| Postgres 16 + | | |
| TimescaleDB | +------------------+
+------------------+
Frontend (Next.js 14 App Router) — server-rendered React, per-tenant URL rewrites in middleware, signed qs_session cookie auth. Talks only to the gateway BFF, never directly to backend services.
Gateway (Go, custom) — per-tenant path rewriting (/t/<slug>/api/v1/... → /api/v1/...), slug-to-tenant-id resolution with 5-min cache, auth middleware proxying to broker’s auth router, rate limiting per tenant (600 req/h on /pipeline/decide).
Broker (FastAPI) — owns the auth router (login, invite, password reset), the upstream execution-API integrations (OAuth-based), the order placement + reconciliation, and the order-history/positions/greeks/margins endpoints. The dry-run lock lives here.
ml (FastAPI) — the 4-layer alpha engine + 7-wall risk castle + 17 ML systems + the live signal dashboard + the playground orchestrator hooks. Long-running APScheduler for the weekly retrain crons.
intelligence (FastAPI) — option chain polling, IV surface computation, OI flow aggregation, GEX estimation, news ingestion + classification. Writes to TimescaleDB hypertables; ml reads from them.
tenant_adaptation (FastAPI) — synthetic-persona orchestrator + per-tenant behavioural ML models. Rule 2 isolation: this container is started with a stripped-down env block excluding all broker credentials. Refuses to boot if real credentials are visible.
Plus support services: backtest (historical replay), copilot (LLM-driven narrations), redis, the autoheal sidecar, and the editorial blog itself.
MLOps stack
ModelRegistry — versioned, auditable, hot-swappable
Every ML model in the platform — seventeen systems, plus internal helpers — goes through one shared ModelRegistry API.
# Save
ModelRegistry.get_instance().save_model(
model=trained_model,
metadata=ModelMetadata(
type_id="strike_selection_model",
framework="sklearn",
hyperparameters={"n_estimators": 400, "max_depth": 6},
train_metrics={"n_samples": 5_000},
validation_metrics={"directional_accuracy": 0.62},
training_data_version="2026-05-26",
training_data_end=date(2026, 5, 26),
retrain_cadence="weekly",
),
adapter=SklearnAdapter(),
auto_promote=True,
)
# Load (boot-time prewarm)
model = ModelRegistry.get_instance().load_production_model(
type_id="strike_selection_model",
tenant_id=None, # global; per-tenant variants supported
adapter=SklearnAdapter(),
)
The registry persists three things per model version: the artifact (pickled bytes, SHA-256 verified), the metadata row (Postgres), and the production-pointer row (one per (type_id, tenant_id)). promote_atomic uses a SERIALIZABLE transaction to flip the pointer.
Adapters
SklearnAdapter — pickles the underlying estimator. Works for scikit-learn, XGBoost, LightGBM uniformly.
JSONAdapter — for state-dict-style learners (anomaly detector, feature drift monitor, slippage coefficients).
A third TorchAdapter exists for future deep-hedging usage but isn’t deployed yet.
Singletons + hot-swap
Each module that needs a fitted model declares a singleton accessor (e.g., get_trend_classifier(), get_strike_selector()). Boot-time prewarm pulls the production version from the registry into the singleton. The weekly retrain cron calls refresh_<thing>() after save, which re-pulls the new version into the same singleton — without restarting the running uvicorn process.
The pattern is identical across all 7 + 5 = 12 models:
def get_<thing>():
with _lock:
s = _cache.get("<thing>")
if s is not None: return s
s = <ThingModel>()
_try_load_into(s) # registry → singleton
_cache["<thing>"] = s
return s
def refresh_<thing>():
with _lock:
s = _cache.get("<thing>")
if s is None: return False
return _try_load_into(s)
This is the difference between “we have a model” and “we have a model that improves every week.” The weekly retrains happen during Sunday IST mornings, between 03:00 and 06:30, one model per slot.
Validation contracts
promote_to_production runs a validation contract before flipping the pointer. The contract is per-type_id and rejects promotions that fail any of:
- Holdout metric below threshold (e.g., trend classifier requires ≥0.50 holdout accuracy)
- Schema mismatch with the running production version
- Artifact size outside expected range (catches “I accidentally trained on the wrong data” bugs)
- SHA-256 verification
A new candidate version that fails the contract stays staged. The old production version keeps serving.
Rollback
registry.rollback(type_id=..., tenant_id=...) reverts to the immediately-previous production version (or a specifically-named version). Auto-promote is bypassed for rollback so a known-old model can be served even if its metrics dropped below the current contract threshold.
The 4-layer alpha engine — implementation detail
Layer 1 — Alpha Discovery (alpha_discovery/)
A factor library + an LLM-driven discovery loop. The library starts with 30+ documented factors (volatility forecasting variants, IV-RV divergence, OI flow signals, cross-market signals, sentiment, GEX positioning). Each factor implements a contract:
class Factor(Protocol):
name: str
description: str
def compute(self, snapshot: MarketSnapshot) -> float: ...
def expected_decay(self) -> float: ...
The discovery loop runs weekly: it asks Gemini 2.5 Pro to propose new candidate factors from market literature, validates each candidate against historical OOS data, and promotes survivors to the production pool. Factors that decay below threshold are de-promoted.
Layer 2 — Signal Generation (signal_generation/)
Eight specialised modules. Each writes a probability distribution over a forward-looking horizon, with conformal-prediction confidence intervals attached.
The combiner is non-trivial: it doesn’t just average. It uses regime-conditional weighting (a signal that’s accurate in RANGE_LOW_IV may be useless in PANIC), drift-aware down-weighting (a signal whose feature distribution has shifted gets penalised), and an internal-consistency gate that refuses to publish when the modules disagree.
Layer 3 — Strategy Selection (strategy_selection/)
This is where the RL agents live. Population-based training (PBT) across 8 regime-specialised agents. Each agent’s policy is a contextual bandit over a fixed action space (the 8 structures from the spec). The PBT loop weekly cross-validates each agent on a held-out window of the prior 30 days; the agent with the best out-of-sample reward gets to drive that regime in production.
The agents inherit from a strict abstract base that forbids inventing new structures — they pick among validated templates only.
Layer 4 — Execution & Hedging (execution/)
Deep-hedging neural networks (trained on synthetic option-pricing data, learn portfolio-level hedge ratios). Smart execution with TWAP/VWAP scheduling. Cost-aware order routing (per-strike fee model). The permanent tail-hedge layer is maintained independently — a separate scheduler cron rolls 5-delta puts ~5 days before expiry.
Multi-tenancy enforcement
Multi-tenancy is the architectural choice that costs the most up-front and pays the most as scale grows. Quantsentinel was multi-tenant from day one. Here is what that looks like at each layer.
Database layer
Postgres role separation:
qs_admin— DDL, migrations only. Used by the schema-apply boot path.qs_app— DML on a tenant-prefixed view of every table. Cannot SELECT or UPDATE rows belonging to a differenttenant_id.
Row-level security policies on every tenant-scoped table:
CREATE POLICY tenant_isolation ON live_signals.signal_payloads
FOR ALL TO qs_app
USING (tenant_id = current_setting('app.tenant_id')::int);
The application calls SET LOCAL app.tenant_id = $1 at the start of each request transaction. RLS does the rest. A bug that forgets to set the tenant id results in zero rows returned — not cross-tenant leakage.
Triggers on the playground_synthetic.* tables raise exceptions on cross-tenant writes:
CREATE OR REPLACE FUNCTION enforce_tenant_id()
RETURNS trigger AS $$
BEGIN
IF NEW.tenant_idx IS NULL THEN
RAISE EXCEPTION 'tenant_idx required';
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
WebSocket layer
Live signal updates use Socket.IO rooms. Each tenant has their own room. Subscribe authorisation is checked on every subscribe call — a user can subscribe only to their own tenant’s room. A bug in the frontend that requests a different room results in a denial, not a leakage.
Audit layer
Every state-changing action goes through audit_log(tenant_id, user_id, event_type, payload). The audit table is append-only (no UPDATE or DELETE granted to qs_app). For SEBI compliance and any future regulator inspection, the trail is complete.
Notification layer
Telegram bot integration: each tenant’s telegram_chat_id is verified to be a private 1:1 chat (the bot refuses to send to group chats or channels). The verification is re-done at the start of each delivery to catch chat-id swaps.
Cost engine — Indian market specifics
The cost engine is small but does heavy lifting. It models:
def total_costs(
*,
structure: str, lots: int, mid_price: float,
side: str, # "BUY" | "SELL"
spread_pts: float, volume_pct: float,
) -> CostBreakdown:
# STT — different rates for buy vs sell, options vs futures
stt = (
0.00125 * notional if (side == "SELL" and is_option(structure)) else
0.00025 * notional if (side == "BUY" and is_option(structure)) else
...
)
# Exchange transaction charges — NSE rates
etc = NSE_RATE_BPS[structure] * notional
# GST at 18% on commission + ETC
gst = 0.18 * (commission + etc)
# Slippage from the calibrated model
slippage = slippage_model.estimate_impact(
lots=lots, mid=mid_price, spread=spread_pts,
volume_pct=volume_pct,
)
# SEBI fees, stamp duty
sebi = 0.0001 * notional
stamp = STAMP_DUTY_RATES[side] * notional
return CostBreakdown(stt, etc, gst, commission, slippage, sebi, stamp)
The cost engine refuses any trade where net_ev = expected_pnl - total_costs ≤ 0. This single check eliminates the majority of retail failure modes.
Synthetic persona orchestrator
The 20-persona fleet is the platform’s solution to the ML cold-start problem. Each persona has a YAML declaration:
# definitions/P05_aggressor.yaml
persona_id: P05
name: "The Aggressor"
initial_state:
knob: 75
capital: 1_500_000
knob_behavior:
base_rule: "stay_at_current"
triggers:
- condition: "consecutive_wins >= 3"
probability: 0.40
action: "knob += 5"
reason_code: "earned_confidence"
is_biased: true
stochastic_noise:
decision_randomness: 0.25
futures_behavior:
enabled: true
baseline_appetite: 0.80
hedge_preference: "never"
hedge_trigger_band: "MODERATE"
max_overnight_lots: 6
size_haircut_naked: 1.0
size_haircut_hedged: 0.0
The orchestrator ticks all 20 personas through simulated trading days. Each persona’s behaviour reflects its declared archetype + stochastic noise + psychological-state transitions (rational ↔ tilted ↔ fatigued based on recent P&L). The resulting trade rows feed the ML training crons.
Rule 1 — the inviolable rule — is that no ML pipeline ever sees persona_id / is_synthetic / bias_type / persona_archetype. The training fetchers strip these fields before generating feature vectors. The synthetic-trade rows are tagged context_source = 'synthetic'; live-execution rows are context_source = 'real'. Training pipelines weight real higher than synthetic; as real volume grows, synthetic naturally decays in influence.
What we measure at runtime
Every component emits structured logs (JSON, parsed by the autoheal sidecar + a future ELK stack). The metrics we care about:
- Decision latency — p50, p95, p99 on
/ml/pipeline/decide(target p99 < 200ms) - Cost-engine rejections — count, percentage by reason code
- Gate firings — daily count per gate, alerts when a single gate fires disproportionately
- ML model accuracy drift — weekly comparison of new candidate vs production
- Tenant-level P&L — paper-test cumulative, drawdown depth, win rate, average winner/loser
- Audit-log volume — should grow steadily; sudden gaps indicate a bug
A grafana dashboard surfaces these to the operator. Tenants see their own subset via the live signal dashboard.
Deployment — concretely
docker compose up -d brings the whole stack up. Container build context lives in the repo root; the deploy directory has a docker-compose.yml and a Caddyfile.
A single commit on the live-signal-dashboard branch ships in approximately 5 minutes from git push to live. The pipeline:
- Developer pushes
- CI (currently lightweight — pytest + basic smoke) gates the commit
- Operator pulls the tarball to the VM, extracts, runs
docker compose build+up -d --force-recreate <service> - Autoheal verifies the new container reaches
healthywithin 60s, otherwise rolls back
Caddy auto-issues TLS certs on first request. The Cookie header carrying qs_session is the primary auth; HTTP basic auth is a fallback for direct-access operator paths.
Closing — why these decisions
Every architectural choice in Quantsentinel was made to optimise for one of three things:
-
Capital preservation — the risk castle, the kill switches, the cost engine, the tail hedges. Every other concern is downstream of this.
-
Reproducibility — model versioning, validation contracts, audit logs. A trade that fires has a complete provenance.
-
Multi-tenant correctness — RLS, tenant-scoped audit, WS room isolation. Scaling to thousands of users is a database-throughput problem, not an architecture rewrite.
These are not the optimisations a hobby project makes. They are the ones an institutional system requires. The platform was built that way from the start.
For deeper conversations about any specific subsystem — the ML pipelines, the cost engine, the multi-tenant guarantees, the deployment topology — direct contact is the right next step. The product overview is at /platform; the executive brief is at /brief.