Testing Methodology
Streaming Content Safety Framework v1.0
How We Test
We use browser automation (Chrome and Playwright) to simulate how real children interact with streaming platforms. Rather than cataloging what parental controls exist, we test whether those controls actually work -- from the perspective of a determined child who wants to watch restricted content.
Every test scenario models a realistic action a child might attempt: searching for a forbidden show, switching to an unrestricted profile, clicking a shared link, or trying to bypass a PIN. Tests are automated, timestamped, and evidence-backed with screenshots at each step.
Our primary adversary model is the “determined child” -- a tech-literate child aged 10-14 who has unsupervised access to a shared device where the streaming platform is already logged in. They know how to search, browse, and switch profiles, but do not use developer tools or exploit technical vulnerabilities.
Test Categories
Each platform is tested across 9 categories, each weighted by severity. Higher weights indicate more critical attack vectors.
Profile Escape
Weight: 5Can a child switch to an unrestricted profile?
Search & Discovery
Weight: 5Can mature content be found via search?
Direct URL / Deep Link
Weight: 3Can mature content be accessed via direct URL?
Kids Mode Escape
Weight: 3Can children escape the restricted browsing experience?
Recommendation Leakage
Weight: 4Does mature content appear in recommendations?
Cross-Profile Bleed
Weight: 3Does watch history bleed across profiles?
Content Rating Gaps
Weight: 2Are content ratings displayed accurately?
PIN/Lock Bypass
Weight: 4Can PIN/password protections be bypassed?
Maturity Filter Effectiveness
Weight: 4Overall maturity filter effectiveness
Scoring Rubric
Each test scenario yields a score from 0 (best) to 4 (worst). Lower scores indicate stronger parental controls.
| Score | Label | Meaning |
|---|---|---|
| 0 | Full Block | The platform completely blocks this attack vector. No restricted content is accessible. |
| 1 | Partial Block | The platform partially blocks access but leaks some information (e.g., content existence revealed). |
| 2 | Soft Barrier | A soft barrier exists (e.g., a warning dialog) but can be bypassed with a single click. |
| 3 | Unprotected | No protection. The child can access restricted content with no barriers. |
| 4 | Facilitated | The platform actively facilitates access to restricted content (e.g., recommending mature content on a kids profile). |
Test Profiles
Each platform is tested from the perspective of three child profiles with different maturity restriction levels.
Child (7)
Kids profileSimulates a 7-year-old on the most restricted Kids profile. Should only see TV-Y and G-rated content. This is the primary test target -- if a child can escape from here, the platform has a critical failure.
Child (12)
Kids / age-restricted profileSimulates a 12-year-old with TV-PG/PG restrictions. Tests whether pre-teen restrictions hold against searches for TV-MA content and profile switching attempts.
Teen (16)
Standard profile with restrictionsSimulates a 16-year-old with TV-14/PG-13 restrictions. Should block TV-MA, R, and NC-17 content. Tests whether teen profiles leak adult content through recommendations or direct URL access.
Grade Calculation
Each profile receives a letter grade (A+ through F) based on a three-step process:
- Weighted average: Each test score (0-4) is multiplied by its category weight. The weighted sum is divided by total weight to produce a weighted average (0.0 = perfect, 4.0 = worst).
- Exponential penalty: The weighted average is normalized to 0-1 and raised to the power of 1.5, then mapped to a 0-100 scale. This penalizes platforms with several moderate failures more heavily than a single bad score.
- Letter grade assignment: The penalized score maps to a letter grade via standard thresholds.
| Grade | Score Range |
|---|---|
| A+ | 95-100 |
| A | 85-94 |
| A- | 80-84 |
| B+ | 75-79 |
| B | 70-74 |
| B- | 65-69 |
| C+ | 60-64 |
| C | 55-59 |
| C- | 50-54 |
| D+ | 40-49 |
| D | 30-39 |
| F | 0-29 |
The platform's overall grade equals the worst profile grade. A platform is only as safe as its weakest profile -- if the Kids profile gets a D but the Teen profile gets a B, the platform grade is D.
Critical Failure Overrides
Certain test failures are so severe that they cap a profile's grade regardless of other scores. These are called Critical Failure Overrides (CFOs).
CFO-2: Zero-Auth Profile Escape
If a child on a Kids or restricted profile can switch to an unrestricted profile with zero authentication (no PIN, no password, no confirmation dialog), the profile's grade is capped at D regardless of how well other categories score. This override exists because profile escape negates all other parental controls -- it does not matter if search is filtered and recommendations are clean if the child can simply switch to an adult profile.
Grades affected by a CFO are marked with an asterisk (e.g., D*) in platform reports.
Platforms Tested
Phase 1 of our streaming content safety research covers 3 major streaming platforms:
Additional platforms (Disney+, Max, Hulu, Paramount+, Apple TV+, YouTube) are planned for future phases.