End-to-End Testing with Playwright in 2026: What's Worth Testing
What I actually test end-to-end vs unit, how to structure a Playwright suite for a SaaS app, and the patterns that prevent the slow flaky test trap most teams fall into.
E2E testing is back
For years, end-to-end testing was the punching bag of the testing pyramid. Slow, flaky, hard to maintain. Then Playwright arrived and quietly fixed most of the pain. Five years in, Playwright is now the default for serious E2E across React, Next.js, and Node-backed apps.
But the new tooling does not save you from old mistakes. The biggest mistake teams still make: testing everything end-to-end. The right question is not "can I write an E2E test for this?" — it is "should I?"
This post is what I actually test, how I structure the suite, and the patterns that keep tests fast and reliable.
What's worth E2E testing
The rule I use: E2E for critical happy paths only. Everything else goes to unit or integration tests.
Critical paths in a typical SaaS:
That is usually 8-12 E2E tests for an entire SaaS. Not 80. Not 800. The temptation to test more is real and you should fight it.
What is not worth E2E testing
If a feature breaks, your happy path test will catch it. You do not need a separate test for "what if the optional middle name field is empty."
Structure that scales
A pattern that has worked across multiple client projects — five spec files cover the entire app, each with 3-5 tests, each test one user journey end to end:
The auth fixture
Single biggest time-saver in Playwright: a custom fixture that gives you a pre-authenticated page. Without it, every test signs up plus logs in, adding 10-15 seconds per test.
With it, login happens once per worker, and each test starts already logged in.
I use Playwright's storageState API to save the auth cookies once, then load them per test. For 30 tests, this saved one project's CI run from 14 minutes to 4.
Selectors that survive refactors
The single biggest source of flaky tests is selectors. CSS selectors break when you change a class. XPath breaks when you add a wrapper div. Text selectors break when you change copy.
The fix: use data-testid attributes for anything you want to select in tests, and use getByRole for everything else.
Order of preference:
The first three options align with how users actually interact with your app, so they break less often than CSS selectors.
Network mocking
Two patterns:
Real network — tests hit your actual dev or staging API. Highest fidelity, slowest, most flaky. Use for the absolute critical paths.
Mocked network — Playwright intercepts requests and returns canned responses. Fast, deterministic, less flaky. Use for everything else.
I default to mocked for most tests, real for the 2-3 most critical paths. The mock-vs-real ratio in my last project was 22-to-3.
For mocking, Playwright's page.route is excellent. You can match by URL pattern and return a custom response. For Stripe-heavy tests, I have a helper that mocks the whole Stripe Checkout flow.
Database state between tests
Tests need isolated state. Two options:
Transaction rollback — wrap each test in a DB transaction and roll back at the end. Fast but only works if your app does not commit explicitly (most do).
Database reset — truncate and re-seed before each test. Slower but bulletproof. I use this for SaaS tests where multi-tenant isolation matters.
For a SaaS suite of 30 tests, full reset adds about 2 seconds per test, 60 seconds total. Acceptable.
For larger suites (200 plus tests), reach for transactional isolation or per-test schemas.
Visual regression — optional
Playwright has a built-in toMatchSnapshot that captures screenshots and diffs them. Useful for catching unintended visual changes.
I use it sparingly — only for the homepage and the main dashboard. Snapshots are easy to over-use, then they constantly break for legitimate design changes and people start ignoring them.
Run in CI
GitHub Actions plus Playwright is well-documented. Key points:
Last point is critical. The moment you start ignoring flaky tests, the entire suite degrades. Fix or delete — never ignore.
Speed targets
For a healthy E2E suite:
If you are above any of these, do not add more tests until you fix the slow or flaky ones. The pain compounds.
When E2E fails you
E2E tests fail for two reasons: real bugs, and infrastructure issues. The second category is your enemy.
Common infra failures:
For each, the fix is structural. Add retries to network calls, isolate DB per worker, request more CI memory, switch to role-based selectors. Never add a generic waitForTimeout — that is the path to a slow flaky suite.
My E2E stack in 2026
TL;DR
If your team has an E2E suite that has grown slow, flaky, or hard to maintain — and you want a senior to audit and restructure it — contact me.
You might also like
Real-Time Apps with Next.js Server Actions and WebSockets in 2026
When Server Actions are enough, when you need a WebSocket layer, and how to wire Pusher / Soketi / Ably into a Next.js 14 App Router project without breaking SSR.
TypeScript Generics for React Engineers: A Practical Guide
The 6 generic patterns I use weekly on React + Next.js codebases — typed hooks, polymorphic components, discriminated unions, infer, constraints — without the academic noise.
Background Jobs in Node.js 2026: BullMQ, Trigger.dev, or Inngest?
Compared on real client projects: BullMQ vs Trigger.dev vs Inngest for Node.js background jobs. What I pick for what, with cost, DX, and operational trade-offs.