vol. 02 · tier 03 // ch. 08 of 10 · advanced course
Algorithmic Trading Systems
When discretionary trading hits its scaling limit, you build a system. This chapter is about the engineering of trading software — architecture, reliability, monitoring, and the…
- read
- ~6 min
- length
- 1,272 words
- position
- 08 of 10
8. Algorithmic Trading Systems
When discretionary trading hits its scaling limit, you build a system. This chapter is about the engineering of trading software — architecture, reliability, monitoring, and the hard-won lessons that prevent silent disasters.
Why automate?
- Consistency — no emotional deviation from the plan.
- Scale — 50 strategies across 200 stocks, monitored 24/7.
- Speed — react to alerts in milliseconds, not minutes.
- Backtest-live alignment — same code logic in research and production.
What NOT to automate
- A strategy you don’t fully understand.
- A strategy that hasn’t been forward-tested live (manually).
- Anything before you’ve experienced live drawdown psychologically.
- “Black box” ML models you can’t explain.
The biggest danger of automation: scaling a broken strategy to lose money faster. Automate winners, not hopes.
System architecture (high-level)
┌─────────────────┐
│ Market Data │
│ (broker API, │
│ websocket, │
│ tick stream) │
└────────┬────────┘
│
┌──────────▼──────────┐
│ Data normalizer │
│ + storage (DB) │
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ Strategy engine │
│ (signal gen) │
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ Risk gateway │
│ (sanity checks) │
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ Order manager │
│ (broker API, │
│ state, retries) │
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ Monitoring + │
│ alerts (Telegram, │
│ Pagerduty) │
└─────────────────────┘
This is roughly the architecture of StalkMarket. Each layer has a single responsibility.
Core components
1. Market data
- Polling (REST) — simple, slow (1–5s). Fine for swing.
- Websocket / streaming — sub-second updates, more complex.
- Storage — store everything (ticks, candles) for replay/backtesting. SQLite/DuckDB for small scale, TimescaleDB/ClickHouse for big.
Best practice: separate “market data ingestion” from strategy logic. Strategies read from the DB; ingestion writes to it. Clean separation, easy to test.
2. Strategy engine
A pure function:
No side effects, no IO. Easy to test, easy to backtest with the same code.
def evaluate(state, ltp, params) -> Signal:
if state.position == 0 and ltp > state.swing_high:
return Signal(action="ENTER_LONG", qty=size_for_risk(state, ltp))
elif state.position > 0 and ltp <= state.stop:
return Signal(action="EXIT", qty=state.position)
else:
return Signal(action="HOLD")
3. Risk gateway
The most important component. Sits between strategy signals and the broker:
- Max order size cap.
- Max # orders per minute.
- Max total exposure.
- Per-stock exposure cap.
- Kill switch on unusual behavior.
- Daily loss limit → halt new orders.
Every strategy signal must pass risk checks. No exceptions. This single layer prevents 99% of “fat finger” or runaway-bot disasters.
4. Order manager
- Translates signals into broker API calls.
- Tracks order state (placed → modified → filled → rejected).
- Handles retries on transient failures.
- Reconciles broker state with internal state at startup.
- Persists every order action for audit.
Idempotency is critical: if your bot crashes mid-order, you must not double-place when it restarts.
5. Monitoring
Without monitoring, your bot is a Russian roulette barrel. Required:
- Heartbeat — system is alive (alerted if stops).
- Per-trade alerts — entries, exits, stops, errors.
- Position summary — daily snapshot.
- Error logs — every exception, retry, anomaly.
- Drift detection — actual P&L vs expected from signals.
Channels: Telegram bot (free, reliable), Discord webhook, PagerDuty for paid setups.
State management
State must be persistent and crash-safe. If your bot dies mid-trade, on restart it must:
- Read its last known state from DB.
- Reconcile with broker (positions, open orders).
- Resume from the correct point.
Use a real database (SQLite at minimum), not in-memory dicts. WAL mode for concurrency.
Time & timezones
Everything UTC internally. Display in IST. Never mix.
NSE-specific timing edge cases:
- Pre-open vs continuous session.
- Holiday calendar (gets out of date — refresh annually).
- DST in foreign markets affecting your scheduling.
- Daylight savings (India has none, but global integrations may).
Testing strategy
Layered:
- Unit tests — each pure function. Fast, comprehensive.
- Integration tests — strategy + DB + mocked broker. Verify flows.
- Backtests — historical replay on N years of data.
- Paper / forward tests — live data, simulated execution.
- Canary deploys — run new strategy with 5% of intended capital first.
The same strategy code should run in backtest and live. Different code paths = bugs.
Risk kill switches
Pre-define and wire them up in code:
| Trigger | Action |
|---|---|
| Daily loss > 3% | Block new entries; existing positions monitored |
| Daily loss > 5% | Liquidate all positions, halt for the day |
| Unrealized loss on single position > 5% | Force exit |
| > 50 orders in 5 min | Pause; require manual override |
| Broker disconnected > 30s | Halt; alert |
| Latency > X ms on critical path | Alert; halt new entries |
| Unexpected exception count > N/min | Halt; alert |
These are not optional. They are insurance against your own bugs.
Deployment & infrastructure
Options for retail algo deployment:
| Option | Pros | Cons |
|---|---|---|
| Local desktop / laptop | Free, easy | Power/internet failures kill bot |
| Raspberry Pi (always-on) | Cheap, low power, dedicated | Limited compute |
| VPS (DigitalOcean, AWS Lightsail) | Reliable, ~$5–20/mo | Setup, monitoring |
| Cloud (AWS/GCP) | Scalable, managed | Cost can balloon |
StalkMarket runs on a Raspberry Pi 5 with Docker — perfect balance for low-frequency strategies.
For low-latency: VPS in Mumbai (close to NSE).
Logging & observability
Structured logs (JSON) make analysis easy. Pino, Winston, structlog.
Each log entry: timestamp, component, level, event, structured data.
{
"ts": "2026-05-03T10:34:21Z",
"component": "strategy",
"event": "signal_generated",
"symbol": "RELIANCE",
"action": "ENTER_LONG",
"qty": 50,
"ltp": 2456.30,
"stop": 2410.00
}
Logs go to file → ship to a log aggregator (Loki, ELK) for searching across days.
Failure modes to expect
- Broker API rate limits → throttle.
- Broker API outage → exponential backoff retry, alert.
- Auth token expiry → auto-refresh logic.
- Stale market data → detect and halt trades.
- Unexpected order rejections (margin, freeze qty) → log, alert, don’t retry blindly.
- Power cut → bot restarts cleanly from persistent state.
- Internet outage → broker may auto-square off MIS positions; algo should recover gracefully.
- Bug in strategy logic → kill switch activates before damage compounds.
Plan for failure. Then plan for the failure of your failure-handling code.
Auditability
Keep records for:
- Tax filing.
- SEBI compliance (you don’t need a PMS license for personal trading, but rules apply).
- Self-review and debugging.
Store everything: every signal, every order, every fill, every config change. Disk is cheap.
Common engineering mistakes
- No backtest-live parity — strategy works in backtest, fails live due to subtle data differences.
- No reconciliation — internal state diverges from broker; hidden positions accumulate.
- Silent failures — exceptions caught and ignored; bot looks fine but doesn’t trade.
- Hardcoded values — broker creds in code, not config. (Use env vars / secrets manager.)
- No version control — “what changed before yesterday’s loss?” — git everything.
- Trading off untested branches — never deploy un-reviewed code.
- No graceful shutdown — SIGTERM should close orders cleanly, not abandon them.
Reading list
- Building Algorithmic Trading Systems — Kevin Davey.
- Designing Data-Intensive Applications — Martin Kleppmann (general but invaluable).
- Site Reliability Engineering — Google (free online).
- Trading Evolved — Andreas Clenow (modern systematic implementation in Python).