End-to-End Testing with Playwright in 2026: What's Worth Testing
TestingFrontendNode.js

End-to-End Testing with Playwright in 2026: What's Worth Testing

What I actually test end-to-end vs unit, how to structure a Playwright suite for a SaaS app, and the patterns that prevent the slow flaky test trap most teams fall into.

HJ
Hassan Javed
May 2026
10 min read

E2E testing is back

For years, end-to-end testing was the punching bag of the testing pyramid. Slow, flaky, hard to maintain. Then Playwright arrived and quietly fixed most of the pain. Five years in, Playwright is now the default for serious E2E across React, Next.js, and Node-backed apps.

But the new tooling does not save you from old mistakes. The biggest mistake teams still make: testing everything end-to-end. The right question is not "can I write an E2E test for this?" — it is "should I?"

This post is what I actually test, how I structure the suite, and the patterns that keep tests fast and reliable.

What's worth E2E testing

The rule I use: E2E for critical happy paths only. Everything else goes to unit or integration tests.

Critical paths in a typical SaaS:

New user signup and onboarding
Subscription upgrade flow
Core product workflow (the thing customers pay for)
Login and password reset
Multi-tenant boundary (cannot see another tenant's data)

That is usually 8-12 E2E tests for an entire SaaS. Not 80. Not 800. The temptation to test more is real and you should fight it.

What is not worth E2E testing

Form validation messages (unit test the validator)
Empty states, loading states (component test or visual snapshot)
Most error states (integration test the API, unit test the error component)
Anything that is "edge case" (unit test the function)
Anything that already passes through your existing happy path

If a feature breaks, your happy path test will catch it. You do not need a separate test for "what if the optional middle name field is empty."

Structure that scales

A pattern that has worked across multiple client projects — five spec files cover the entire app, each with 3-5 tests, each test one user journey end to end:

auth.spec.ts — login, signup, password reset
subscription.spec.ts — checkout, upgrade, cancel
workspace.spec.ts — create, switch, isolation
core-workflow.spec.ts — the main product feature
plus fixtures/ and utils/ folders for shared helpers

The auth fixture

Single biggest time-saver in Playwright: a custom fixture that gives you a pre-authenticated page. Without it, every test signs up plus logs in, adding 10-15 seconds per test.

With it, login happens once per worker, and each test starts already logged in.

I use Playwright's storageState API to save the auth cookies once, then load them per test. For 30 tests, this saved one project's CI run from 14 minutes to 4.

Selectors that survive refactors

The single biggest source of flaky tests is selectors. CSS selectors break when you change a class. XPath breaks when you add a wrapper div. Text selectors break when you change copy.

The fix: use data-testid attributes for anything you want to select in tests, and use getByRole for everything else.

Order of preference:

1.page.getByRole — accessible, works for screen readers too
2.page.getByLabel — forms
3.page.getByTestId — when role or label is not enough
4.page.locator with CSS — last resort

The first three options align with how users actually interact with your app, so they break less often than CSS selectors.

Network mocking

Two patterns:

Real network — tests hit your actual dev or staging API. Highest fidelity, slowest, most flaky. Use for the absolute critical paths.

Mocked network — Playwright intercepts requests and returns canned responses. Fast, deterministic, less flaky. Use for everything else.

I default to mocked for most tests, real for the 2-3 most critical paths. The mock-vs-real ratio in my last project was 22-to-3.

For mocking, Playwright's page.route is excellent. You can match by URL pattern and return a custom response. For Stripe-heavy tests, I have a helper that mocks the whole Stripe Checkout flow.

Database state between tests

Tests need isolated state. Two options:

Transaction rollback — wrap each test in a DB transaction and roll back at the end. Fast but only works if your app does not commit explicitly (most do).

Database reset — truncate and re-seed before each test. Slower but bulletproof. I use this for SaaS tests where multi-tenant isolation matters.

For a SaaS suite of 30 tests, full reset adds about 2 seconds per test, 60 seconds total. Acceptable.

For larger suites (200 plus tests), reach for transactional isolation or per-test schemas.

Visual regression — optional

Playwright has a built-in toMatchSnapshot that captures screenshots and diffs them. Useful for catching unintended visual changes.

I use it sparingly — only for the homepage and the main dashboard. Snapshots are easy to over-use, then they constantly break for legitimate design changes and people start ignoring them.

Run in CI

GitHub Actions plus Playwright is well-documented. Key points:

Run tests in parallel (4-8 workers, depending on your CI machine)
Cache the node_modules and Playwright browsers between runs
Upload screenshots and videos for failed tests as CI artifacts
Fail the build on any test failure — no "this one is flaky, ignore it" exceptions

Last point is critical. The moment you start ignoring flaky tests, the entire suite degrades. Fix or delete — never ignore.

Speed targets

For a healthy E2E suite:

Full suite under 5 minutes in CI
Single test under 30 seconds locally
Flaky test rate under 1 percent

If you are above any of these, do not add more tests until you fix the slow or flaky ones. The pain compounds.

When E2E fails you

E2E tests fail for two reasons: real bugs, and infrastructure issues. The second category is your enemy.

Common infra failures:

Network timeout to a third party (Stripe test mode is slow occasionally)
DB seed race condition (parallel workers stepping on each other)
Browser launch failure on CI (memory limits)
Flaky selector (CSS class name changed)

For each, the fix is structural. Add retries to network calls, isolate DB per worker, request more CI memory, switch to role-based selectors. Never add a generic waitForTimeout — that is the path to a slow flaky suite.

My E2E stack in 2026

Playwright (latest)
TypeScript
Custom auth fixture with storage state
Page object model for shared selectors (one per feature)
Mocked network for most tests, real network for 2-3 critical paths
DB reset between tests (truncate plus re-seed)
GitHub Actions, 4 workers, full suite under 4 minutes

TL;DR

E2E for critical happy paths only (8-12 tests for a typical SaaS)
Everything else: unit or integration tests
Auth fixture with storage state — single biggest speed win
Role-based selectors, data-testid as fallback
Mock network for most, real for the critical paths only
Reset DB between tests, run parallel in CI
Never ignore a flaky test — fix or delete

If your team has an E2E suite that has grown slow, flaky, or hard to maintain — and you want a senior to audit and restructure it — contact me.

Related Reads

You might also like