Platform EngineeringIn Development

PolyEdge

Real-time prediction market arbitrage detection leveraging LLM-driven semantic parsing, identifying risk-free profit opportunities across exchanges.

PolyEdge — 1

Rust Tokio computation & trading engine

Custom RAG LLM pipeline

Rust Axum SSE Server

Multi-market arbitrage detection

Real-time market monitoring

Fault-tolerant distributed systems

Overview

PolyEdge detects when multi-outcome prediction market probabilities sum to less than 100%, creating risk-free arbitrage opportunities. It monitors multiple prediction market venues in real-time, computing edge across all active markets and alerting when profitable spreads appear in orderbooks, with the option to configure and execute trades automatically for higher tier subscriptions.

The hot path is entirely Rust — a Tokio-based data plane that ingests live market data via WebSockets, computes arbitrage edge in real time, and executes trades through bounded execution lanes. Python runs the semantic pipeline as a separate process, handling market discovery, embedding-based retrieval, and cross-venue pair verification. A Next.js dashboard serves the control plane, with PostgreSQL for persistent source of truth.

Challenges

Non-blocking, low-latency arbitrage detection across hundreds if not thousands of concurrent market feeds, with atomic multi-leg trade execution for auto-trading subscribers where a failed side must trigger coordinated rollback without orphaning positions.

Floating-point drift across concurrent probability calculations can silently turn a profitable spread into a loss; the engine needs real-time throughput without sacrificing arithmetic correctness.

Markets across venues like Kalshi and Polymarket describe the same events with different naming, resolution criteria, date formats, and outcome structures — false positives carry direct financial risk.

The latency-critical trading engine and the computationally expensive ML pipeline must coexist without inference load, model reloads, or VRAM pressure ever propagating into any customer's dedicated engine instance.

Executing trades on behalf of customers requires their exchange API keys at submission time without persisting secrets in the application database.

Approach

The Tokio data plane ingests orderbook updates from persistent WebSocket connections, recomputes edge for affected pairs on every book change, and fans opportunities out to subscribed strategies — all non-blocking. For auto-execution, tokio::spawn races both legs in parallel via tokio::select; first failure aborts the survivor and issues a best-effort venue cancel through an idempotent order state machine. DB writes are fire-and-forget off the critical path, batched and deduplicated via ON CONFLICT constraints with a dead-letter queue after repeated failures.

A ScaledInt fixed-point representation replaces floating-point throughout the engine. Prices, sizes, and edge calculations use integer arithmetic with explicit scale tracking and overflow-checked operations, maintaining precision from detection through fill reconciliation.

A two-stage verification gate: a cross-encoder NLI model scores bidirectional entailment between candidate pairs, then a weighted verifier combines cross-encoder confidence (50%), threshold alignment (20%), entity matching (15%), date coherence (10%), and data source agreement (5%) to produce a final verdict. Dynamic weight redistribution handles missing fields without capping scores, and domain-specific relaxations adjust thresholds where resolution conventions differ systematically. Verification verdicts and confidence scores feed a calibration monitor that tracks score-distribution drift and surfaces threshold adjustment recommendations — human-gated rather than auto-tuned, due to survivorship bias in accepted-pair outcomes making fully automated recalibration unsafe when false positives carry financial exposure.

The Rust engine runs on pure Tokio with no HTTP framework and zero LLM calls in the hot path. The Python pipeline operates as a separate process, writing verified pairs to PostgreSQL where the engine reads them asynchronously. Dynamic VRAM allocation with auto-tuned batch sizing and GPU concurrency semaphores keeps the ML pipeline portable across heterogeneous CUDA hardware without manual tuning.

Customer exchange credentials are stored as KMS/Vault references only; the credentials service fetches secrets just-in-time with an in-memory TTL cache, ensuring plaintext keys never touch the application database or persist beyond their usage window.

Tech Stack

RustTokioAxumPythonPyTorchQdrantFastAPIgRPCRedis / ValkeyPostgreSQLTypeScriptNext.jsTailwind CSSStripe