Thirteen frameworks. One verdict: ship or don't.

DevTools

Quality Telemetry Platform

Thirteen frameworks. One verdict: ship or don't.

A 13-framework testing infrastructure covering unit, E2E, mobile, security, BDD, performance, contract, visual regression, and Lighthouse CI.

Frameworks

< 8 min

CI Run Time

Prod Regressions Missed

90+

Lighthouse Score

Problem

The challenge

A test suite that only covers the happy path isn't a safety net — it's a false sense of security. Real quality engineering means having the right kind of test at every layer: contracts that prevent API breakage, visual regression that catches layout drift, performance budgets that catch bundle bloat, security scans that flag injection vectors before they hit production.

The Quality Telemetry Platform is the testing infrastructure that runs under all Sage Ideas products. It's not a project delivered to a client — it's the engineering discipline that makes every client engagement trustworthy.

The challenge: building a coherent, maintainable multi-framework testing system that doesn't collapse under its own weight. The risk with "13 frameworks" is that it becomes an unmaintained museum. The architecture here is designed so each framework has a single, non-overlapping responsibility.

Approach

How we built it

Framework responsibility map: Jest (unit — pure functions, utilities), Vitest (fast unit tests for Next.js components), Playwright (E2E browser tests — user flows, auth), Testing Library (component integration), Supertest (API endpoint contract), Pact (consumer/provider contract tests), Cypress (supplemental E2E for visual-heavy flows).

k6 (performance/load — response time under traffic), Lighthouse CI (performance budgets — CWV, accessibility, SEO), OWASP ZAP (DAST security scan — injection, XSS, misconfiguration), Axe (WCAG 2.1 AA automated audit), Percy/Chromatic (visual regression — pixel-diff for UI components), Cucumber/BDD (behavior specs — readable test scenarios).

The architecture principle: each framework owns a layer. Tests don't duplicate each other's coverage. If a bug can be caught by a unit test, it never reaches the E2E layer. This makes the suite fast, focused, and maintainable.

Architecture

System map

How the pieces talk to each other.

Built UI

Selected screens

Real product surfaces from the engagement — not stock illustrations.

Grafana SLO board with p95 latency 124ms and error rate 0.04%

1 / 2

SLO board — p95 latency 124ms, error rate at 0.04%, weekly burn-rate alerts wired.

Evidence

What it actually looks like

Architecture diagrams, CI runs, and dashboards from the engagement — not stock illustrations.

End-to-end coverage across critical journeys. The report is the deliverable — stakeholders see exactly what was tested, what passed, and what got skipped.

Aggregated test results across thirteen frameworks in one place. Trends over time, flake detection, and a single pane of glass for ship-or-don’t.

Build

What shipped

13 configured, actively maintained framework integrations. GitHub Actions CI pipeline running all frameworks in parallel with appropriate test gates. Lighthouse CI budget configuration (LCP < 2.5s, CLS < 0.1, TBT < 200ms).

Playwright E2E suite covering authentication, checkout, and core user flows across all products. OWASP ZAP automated security scan on every production deployment. Pact contract tests for all cross-service API boundaries. Percy visual regression baseline for all critical UI components.

Reporting: test results aggregated into GitHub PR checks and Slack notifications.

Outcome

Results

Zero production regressions caught in post-deploy monitoring that weren't first caught by CI (across 12 months of active use). Lighthouse CI budgets maintained: all Sage Ideas products score 90+ on Performance and Accessibility.

Contract testing layer prevented 3 breaking API changes from reaching production during Nexural development. Full test suite runs in under 8 minutes in CI (parallelized across 4 runners).

Testing infrastructure is a product decision, not a technical nicety. The studio now starts every new engagement with this infrastructure in place — not as an upgrade, but as the foundation.

Artifacts

Available

GitHub: Testing framework configuration templates
CI pipeline configuration
Lighthouse CI budget documentation
Test coverage policy documentation

References

Talk to people on this work.

No fabricated quotes. Reference contacts are shared during discovery, with both parties' consent.

Reference available

Engineering lead

Fintech · 5 years

Worked alongside on production trading systems for 5+ years. Available for technical reference calls — code quality, on-call discipline, incident behavior.

Reference call shared during discovery, both consenting.

Reference available

Founder

Studio engagement

Engaged Sage Ideas for a Ship + Operate combination. Willing to talk about scope discipline, timeline accuracy, and what handoff actually looked like.

Reference call shared during discovery, both consenting.

Start a project

Build something like this

External

See our quality standards

“If the dashboard can't tell you whether last night's deploy was safe, it's wallpaper.”

// build log · entry 04

Honesty

What almost happened.

Every project has near-misses. Decisions that, if we'd kept going, would have shipped a hole. The list below is the diff between the version that almost made it to prod and the version that did.

// near-miss · 01

diff

beforeCI was going to run all 13 frameworks on every PR. Pipeline wall time approached 40 minutes, devs started force-merging around it.

afterSelective execution by changed-path — a docs change runs Lighthouse + lint, a backend change runs unit + contract + Pact, a release branch runs all 13.

costA weekend writing the path-router. p50 PR time dropped from 38min to 7min.

// near-miss · 02

diff

beforeVisual regression was set to fail any pixel diff over 0.1%. Result: a font-rendering tweak in Chromium broke 200 snapshots overnight.

afterDiffing is structural — a Pixelmatch threshold tuned per surface, plus a ratchet that lets diffs land if a human reviewer ack'd them in the PR.

costTwo days of tuning. Zero false positives in 90 days.

From the repo

Inline excerpts.

Trimmed, but real. These are the patterns that made the system survive Stripe retries, multi-tenant queries, and a Discord bot that won't hallucinate positions.

Path-routed CI matrix

yaml

# .github/workflows/ci.yml — path filtering
on:
  pull_request:
    paths:
      - 'apps/**'
      - 'packages/**'
      - '.github/workflows/**'

jobs:
  detect:
    runs-on: ubuntu-latest
    outputs:
      ui: ${{ steps.f.outputs.ui }}
      api: ${{ steps.f.outputs.api }}
    steps:
      - uses: dorny/paths-filter@v3
        id: f
        with:
          filters: |
            ui: ['apps/web/**']
            api: ['apps/api/**']

  e2e:
    needs: detect
    if: needs.detect.outputs.ui == 'true'
    uses: ./.github/workflows/playwright.yml

  contract:
    needs: detect
    if: needs.detect.outputs.api == 'true'
    uses: ./.github/workflows/pact.yml

// Only the suites that can possibly be affected by the diff actually run.