TestingFrontendNode.js

End-to-End Testing with Playwright in 2026: What's Worth Testing

What I actually test end-to-end vs unit, how to structure a Playwright suite for a SaaS app, and the patterns that prevent the slow flaky test trap most teams fall into.

Hassan Javed

May 2026

10 min read

E2E testing is back

For years, end-to-end testing was the punching bag of the testing pyramid. Slow, flaky, hard to maintain. Then Playwright arrived and quietly fixed most of the pain. Five years in, Playwright is now the default for serious E2E across React, Next.js, and Node-backed apps.

But the new tooling does not save you from old mistakes. The biggest mistake teams still make: testing everything end-to-end. The right question is not "can I write an E2E test for this?" — it is "should I?"

This post is what I actually test, how I structure the suite, and the patterns that keep tests fast and reliable.

What's worth E2E testing

The rule I use: E2E for critical happy paths only. Everything else goes to unit or integration tests.

Critical paths in a typical SaaS:

▸New user signup and onboarding

▸Subscription upgrade flow

▸Core product workflow (the thing customers pay for)

▸Login and password reset

▸Multi-tenant boundary (cannot see another tenant's data)

That is usually 8-12 E2E tests for an entire SaaS. Not 80. Not 800. The temptation to test more is real and you should fight it.

What is not worth E2E testing

▸Form validation messages (unit test the validator)

▸Empty states, loading states (component test or visual snapshot)

▸Most error states (integration test the API, unit test the error component)

▸Anything that is "edge case" (unit test the function)

▸Anything that already passes through your existing happy path

If a feature breaks, your happy path test will catch it. You do not need a separate test for "what if the optional middle name field is empty."

Structure that scales

A pattern that has worked across multiple client projects — five spec files cover the entire app, each with 3-5 tests, each test one user journey end to end:

▸auth.spec.ts — login, signup, password reset

▸subscription.spec.ts — checkout, upgrade, cancel

▸workspace.spec.ts — create, switch, isolation

▸core-workflow.spec.ts — the main product feature

▸plus fixtures/ and utils/ folders for shared helpers

The auth fixture

Single biggest time-saver in Playwright: a custom fixture that gives you a pre-authenticated page. Without it, every test signs up plus logs in, adding 10-15 seconds per test.

With it, login happens once per worker, and each test starts already logged in.

I use Playwright's storageState API to save the auth cookies once, then load them per test. For 30 tests, this saved one project's CI run from 14 minutes to 4.

Selectors that survive refactors

The single biggest source of flaky tests is selectors. CSS selectors break when you change a class. XPath breaks when you add a wrapper div. Text selectors break when you change copy.

The fix: use data-testid attributes for anything you want to select in tests, and use getByRole for everything else.

Order of preference:

1.page.getByRole — accessible, works for screen readers too

2.page.getByLabel — forms

3.page.getByTestId — when role or label is not enough

4.page.locator with CSS — last resort

The first three options align with how users actually interact with your app, so they break less often than CSS selectors.

Network mocking

Two patterns:

Real network — tests hit your actual dev or staging API. Highest fidelity, slowest, most flaky. Use for the absolute critical paths.

Mocked network — Playwright intercepts requests and returns canned responses. Fast, deterministic, less flaky. Use for everything else.

I default to mocked for most tests, real for the 2-3 most critical paths. The mock-vs-real ratio in my last project was 22-to-3.

For mocking, Playwright's page.route is excellent. You can match by URL pattern and return a custom response. For Stripe-heavy tests, I have a helper that mocks the whole Stripe Checkout flow.

Database state between tests

Tests need isolated state. Two options:

Transaction rollback — wrap each test in a DB transaction and roll back at the end. Fast but only works if your app does not commit explicitly (most do).

Database reset — truncate and re-seed before each test. Slower but bulletproof. I use this for SaaS tests where multi-tenant isolation matters.

For a SaaS suite of 30 tests, full reset adds about 2 seconds per test, 60 seconds total. Acceptable.

For larger suites (200 plus tests), reach for transactional isolation or per-test schemas.

Visual regression — optional

Playwright has a built-in toMatchSnapshot that captures screenshots and diffs them. Useful for catching unintended visual changes.

I use it sparingly — only for the homepage and the main dashboard. Snapshots are easy to over-use, then they constantly break for legitimate design changes and people start ignoring them.

Run in CI

GitHub Actions plus Playwright is well-documented. Key points:

▸Run tests in parallel (4-8 workers, depending on your CI machine)

▸Cache the node_modules and Playwright browsers between runs

▸Upload screenshots and videos for failed tests as CI artifacts

▸Fail the build on any test failure — no "this one is flaky, ignore it" exceptions

Last point is critical. The moment you start ignoring flaky tests, the entire suite degrades. Fix or delete — never ignore.

Speed targets

For a healthy E2E suite:

▸Full suite under 5 minutes in CI

▸Single test under 30 seconds locally

▸Flaky test rate under 1 percent

If you are above any of these, do not add more tests until you fix the slow or flaky ones. The pain compounds.

When E2E fails you

E2E tests fail for two reasons: real bugs, and infrastructure issues. The second category is your enemy.

Common infra failures:

▸Network timeout to a third party (Stripe test mode is slow occasionally)

▸DB seed race condition (parallel workers stepping on each other)

▸Browser launch failure on CI (memory limits)

▸Flaky selector (CSS class name changed)

For each, the fix is structural. Add retries to network calls, isolate DB per worker, request more CI memory, switch to role-based selectors. Never add a generic waitForTimeout — that is the path to a slow flaky suite.

My E2E stack in 2026

▸Playwright (latest)

▸TypeScript

▸Custom auth fixture with storage state

▸Page object model for shared selectors (one per feature)

▸Mocked network for most tests, real network for 2-3 critical paths

▸DB reset between tests (truncate plus re-seed)

▸GitHub Actions, 4 workers, full suite under 4 minutes

TL;DR

▸E2E for critical happy paths only (8-12 tests for a typical SaaS)

▸Everything else: unit or integration tests

▸Auth fixture with storage state — single biggest speed win

▸Role-based selectors, data-testid as fallback

▸Mock network for most, real for the critical paths only

▸Reset DB between tests, run parallel in CI

▸Never ignore a flaky test — fix or delete

If your team has an E2E suite that has grown slow, flaky, or hard to maintain — and you want a senior to audit and restructure it — contact me.

Real-Time Apps with Next.js Server Actions and WebSockets in 2026

When Server Actions are enough, when you need a WebSocket layer, and how to wire Pusher / Soketi / Ably into a Next.js 14 App Router project without breaking SSR.

10 min readRead

TypeScriptReact

TypeScript Generics for React Engineers: A Practical Guide

The 6 generic patterns I use weekly on React + Next.js codebases — typed hooks, polymorphic components, discriminated unions, infer, constraints — without the academic noise.

10 min readRead

Node.jsBackend

Background Jobs in Node.js 2026: BullMQ, Trigger.dev, or Inngest?

Compared on real client projects: BullMQ vs Trigger.dev vs Inngest for Node.js background jobs. What I pick for what, with cost, DX, and operational trade-offs.

10 min readRead

Back to all articles Follow on LinkedIn

E2E testing is back

What's worth E2E testing

What is not worth E2E testing

Structure that scales

The auth fixture

Selectors that survive refactors

Network mocking

Database state between tests

Visual regression — optional

Run in CI

Speed targets

When E2E fails you

My E2E stack in 2026

TL;DR

You might also like

Real-Time Apps with Next.js Server Actions and WebSockets in 2026

TypeScript Generics for React Engineers: A Practical Guide

Background Jobs in Node.js 2026: BullMQ, Trigger.dev, or Inngest?