
Five models. Two hundred indicators. No black boxes.
Five models. Two hundred indicators. No black boxes.
A Python-based ML signal engine with 200+ technical indicators and 5 ensemble models — explainable, auditable, and open source.
The challenge
Most algorithmic trading tools are black boxes — a signal output with no visibility into why it fired, what inputs drove it, or how it would have performed historically. For a technically sophisticated trader, that's not a tool. It's a guess with a UI.
AlphaStream was built around a different premise: every signal should be explainable, every model should be auditable, and the entire system should run in Python on hardware you control.
The challenge: building a signal engine simultaneously comprehensive enough to cover 200+ technical indicators across multiple timeframes, fast enough to process live market data without falling behind, and transparent enough that a practitioner can understand exactly what drove each output.
How we built it
Data layer: market data ingestion from multiple sources, normalized into a unified OHLCV + extended data model. Indicator layer: 200+ technical indicators computed via pandas, TA-Lib, and custom implementations — RSI, MACD, Bollinger Bands, ADX, Ichimoku, custom momentum composites.
ML layer: 5 models trained per instrument/timeframe — XGBoost (gradient boosting, primary signal), LightGBM (secondary signal, speed-optimized), Random Forest (confidence calibration), Ridge Regression (trend baseline), and an Ensemble Voter combining all four with learned weights. Backtesting via walk-forward validation with held-out test sets — no look-ahead bias.
Feature engineering is where the edge lives. The 200+ indicators aren't noise — they're the vocabulary the models learn from. The ensemble architecture ensures no single model dominates, and the agreement score tells you when the models disagree (which is itself a signal).
System map
How the pieces talk to each other.
Selected screens
Real product surfaces from the engagement — not stock illustrations.

Live dashboard — 14 strategies running, 200+ indicators streaming, latency under 200ms.
What it actually looks like
Architecture diagrams, CI runs, and dashboards from the engagement — not stock illustrations.
What shipped
Python package with clean CLI and programmatic API. 200+ indicator implementations (TA-Lib + pandas + custom). 5 trained model pipeline (XGBoost, LightGBM, RF, Ridge, Ensemble). Backtesting engine with walk-forward validation.
Signal output with explainability layer (feature importance, SHAP values). Public GitHub repository: 5★, 2 forks, active maintenance. Full documentation including strategy examples.
Results
5★ GitHub rating from practitioners in the quant/algo trading community. 2 forks by external developers extending the system for their own use cases.
Walk-forward backtests across multiple instruments and timeframes demonstrating consistent signal quality. SHAP explainability output allows practitioners to understand per-signal feature attribution.
ML signal engines for trading don't require a hedge fund infrastructure team. A well-engineered Python package with the right architecture can be built, maintained, and extended by a single practitioner — and released as open source without compromising the core thesis.
Available
- GitHub repository → github.com/jteixeira/alphastream
- Strategy documentation
- Backtesting methodology notes
Talk to people on this work.
No fabricated quotes. Reference contacts are shared during discovery, with both parties' consent.
Engineering lead
Worked alongside on production trading systems for 5+ years. Available for technical reference calls — code quality, on-call discipline, incident behavior.
Founder
Engaged Sage Ideas for a Ship + Operate combination. Willing to talk about scope discipline, timeline accuracy, and what handoff actually looked like.
“A signal you can't explain is a signal you can't ship. Every prediction comes back with its feature importances attached.”
What almost happened.
Every project has near-misses. Decisions that, if we'd kept going, would have shipped a hole. The list below is the diff between the version that almost made it to prod and the version that did.
Inline excerpts.
Trimmed, but real. These are the patterns that made the system survive Stripe retries, multi-tenant queries, and a Discord bot that won't hallucinate positions.
# features/cv.py — production excerpt
def purged_kfold(idx: pd.Index, n_splits: int, embargo: int) -> Iterator[Split]:
"""Walk-forward CV with a hard embargo gap between train and test.
embargo: number of bars to drop on each side of the test window
so leakage from rolling-window features cannot bleed in.
"""
fold_size = len(idx) // (n_splits + 1)
for k in range(n_splits):
test_start = (k + 1) * fold_size
test_end = test_start + fold_size
train_end = max(0, test_start - embargo)
train = idx[:train_end]
test = idx[test_start:test_end]
yield Split(train=train, test=test, k=k)# api/predict.py
@app.post("/predict")
def predict(req: PredictRequest) -> PredictResponse:
x = build_feature_row(req.symbol, req.ts)
proba = model.predict_proba(x)[0, 1]
shap_values = explainer(x)
top = sorted(
zip(FEATURE_NAMES, shap_values),
key=lambda kv: abs(kv[1]),
reverse=True,
)[:5]
return PredictResponse(
symbol=req.symbol,
proba=float(proba),
signal="long" if proba > 0.55 else "flat",
attributions=[{"feature": f, "shap": float(v)} for f, v in top],
model_version=MODEL_VERSION,
feature_hash=FEATURE_HASH,
)