Research & Methodology

🎯

SUPER SELECTOR Algorithm

Statistical Unified Player Evaluation and Ranking SELECTOR

Our predicted XI algorithm combines multiple data sources to generate optimal team compositions:

Scoring Components

Base Classification (0-30 pts) - Role-based scoring for position fit
Performance Tags - Phase-specific tags (PP_ELITE, DEATH_SPECIALIST, etc.)
Derived Metrics - boundary%, consistency_index, death_dot_pct
Price Tier Bonus (5-15%) - Investment level consideration
Variety Optimization - LHB/RHB balance, spin/pace mix

Hard Constraints

C1: Captain cannot be Impact Player
C2: Maximum 4 overseas players
C3: Minimum 20 overs bowling coverage
C4: At least 1 wicketkeeper
C5: At least 1 spinner

🏈

PFF-Inspired Grading

Process over outcome evaluation from Pro Football Focus

Pro Football Focus revolutionized NFL analytics by grading every play. We adapt their methodology:

Key Concepts

Ball-by-Ball Grading - Evaluate each delivery on a -2 to +2 scale
Process Over Outcome - Good decisions that fail still get positive grades
Context Adjustments - Situation, opposition, and conditions matter
WAR (Wins Above Replacement) - Aggregate value metric

Cricket Applications

P0 Batter grading vs bowling type and phase
P0 Bowler grading by delivery outcome quality
P1 Fielding impact assessment
P2 Captain decision grading

📄 PFF Research Document (Internal)

🏀

KenPom Efficiency Metrics

Tempo-free statistics from college basketball analytics

Ken Pomeroy's basketball ratings remove pace from the equation. We apply similar concepts:

Key Concepts

Adjusted Efficiency - Opponent-normalized performance
Tempo-Free Stats - Per-possession (or per-ball) metrics
Four Factors - Decompose performance into components
Strength of Schedule - Quality of opposition faced

Cricket Applications

P0 Four Factors: Boundary%, Dot Ball%, Extras, Bowling Changes
P0 Venue Park Factors: Adjust for pitch and ground size
P1 Opposition Strength Index: Weight by opponent quality
P1 Adjusted Strike Rate: SR normalized by context

📄 KenPom Research Document (Internal)

🎨

Player Clustering (K-Means V2)

Archetype-based player classification

We use K-means clustering to identify natural player archetypes:

Batter Archetypes

EXPLOSIVE_OPENER - High SR, PP aggression, boundary-heavy
PLAYMAKER - Consistent scoring, adaptable approach
ANCHOR - Low dot%, innings builder, lower SR
MIDDLE_ORDER - Middle overs specialist, rotation focus
FINISHER - Death overs specialist, high SR at end

Bowler Archetypes

WORKHORSE - Consistent economy, regular overs
NEW_BALL_SPECIALIST - PP wickets, swing/seam
DEATH_SPECIALIST - Low death economy, yorker execution
WICKET_TAKER - High wickets, aggressive approach

📄 Creative Archetype Descriptions (Internal)

📊

CricPom: Novel Composite Metrics

KenPom-for-Cricket adjusted rating system

CricPom adapts college basketball's KenPom methodology to T20 cricket, producing opponent-adjusted, venue-neutral player ratings that account for tournament quality and conditions similarity.

Core Adjusted Metrics

AdjBRR (Adjusted Batting Run Rate) — Batting run rate adjusted for bowling quality faced, venue park factor, and match context. Formula: raw_RR × (league_avg_bowling / opponent_bowling_quality) × venue_factor
AdjBE (Adjusted Bowling Economy) — Economy rate adjusted for batting quality faced and venue. Lower is better. raw_econ × (league_avg_batting / opponent_batting_quality) × venue_factor
CEM (Composite Efficiency Metric) — All-rounder evaluation combining AdjBRR and AdjBE into a single efficiency score. CEM = w₁·AdjBRR_percentile + w₂·(1 - AdjBE_percentile)
OSI (Opponent Strength Index) — Weighted average quality of opponents faced, used as the adjustment denominator in all CricPom ratings

5-Factor Tournament Quality Engine

CricPom consumes the Tournament Quality Weighting system (see below) to weight data from 426 T20 tournaments. Each tournament's data receives a composite weight computed as:

W = ∏(fᵢ^wᵢ)^(1/Σwᵢ) — geometric mean of 5 factors

PQI (25%) — Player Quality Index: average career quality of participants
Competitiveness (20%) — Match balance and outcome distributions
Recency (20%) — Exponential decay favoring recent data
Conditions Similarity (15%) — How closely conditions match IPL 2023-2025
Sample Confidence (20%) — Statistical reliability from match volume

How It Differs from Raw Stats

Aspect	Raw Stats	CricPom Adjusted
Opposition quality	Ignored	OSI-adjusted
Venue effects	Ignored	Park factor adjusted
Tournament relevance	All equal	5-factor weighted
Recency	All equal	Exponential decay
All-rounder evaluation	Separate batting/bowling	Unified CEM score

Status: Tournament weights computed (TKT-187). CricPom foundation metrics (AdjBRR, AdjBE, CEM) implemented (TKT-190). Groundwork research complete.

📄 KenPom Research Foundation (Internal)

⚖️

Tournament Quality Weighting

Jose Mourinho's 5-Factor Composite Weight System

Not all T20 data is equal. Our weighting system quantifies tournament quality across 426 tournaments and 9,357 matches:

5-Factor Composite Weight

Player Quality Index (PQI) - Average career quality of tournament participants
Competitiveness Index (CI) - Match outcome balance, margin distributions
Recency Decay - Exponential decay weighting recent tournaments higher
Conditions Similarity - How closely tournament conditions match IPL 2023-2025 (Founder Decision #6)
Sample Size Confidence - Statistical reliability based on matches played

IPL 2023+ Baseline (Founder-Locked)

All conditions comparisons use IPL 2023-2025 as the baseline, not all-time IPL averages. The data shows the 2021-22 to 2023-25 transition produced the largest single jump in IPL history: Run Rate +1.00, Boundary% +3.2, Six% +1.65. Using the all-time average (RR 7.86) would dilute comparisons against a fundamentally different era than the modern IPL (RR 8.98).

Tournament Tiers (Provisional)

Tier 1A IPL - Baseline (1.0x weight)
Tier 1B PSL, SA20, The Hundred, MLC, BBL, CPL - Major franchise leagues (0.70-0.85)
Tier 1C ILT20, LPL, Super Smash, Vitality Blast - Established leagues, lower overlap (0.50-0.70)
Tier 2 T20 World Cup, Asia Cup - High-quality international (0.60-0.80)
Tier 3 SMAT - Domestic Indian T20 (0.40-0.50)

Status: Plan approved by Founder. IPL 2023+ baseline locked. Implementation via TKT-183 (8-12 days).

📄 Tournament Weighting Plan (Internal)

🔬

Dual-Scope Analytics Framework

All-Time vs Since-2023 view architecture (TKT-181)

Every analytical view now exists in two scopes to balance historical context with current-form accuracy:

Architecture

_alltime views - Full IPL history (2008-2025, ~1,169 matches). Used for career records, historical comparisons
_since2023 views - Current analytical window (2023-2025, ~219 matches). Used for all predictive outputs, tags, archetypes
80 dual-scope views - 40 pairs covering batting, bowling, phase, venue, matchup, and Film Room tactical analysis

Why 2023+? The Data Evidence (Founder-Approved)

DuckDB analysis of 1,169 IPL matches reveals a structural break at 2023 — the largest single-era shift in IPL history:

Era	Matches	Run Rate	Boundary%	Six%	Dot%
2008-2012	322	7.60	15.2	4.04	36.1
2013-2017	314	7.90	16.1	4.63	34.8
2018-2020	180	8.17	16.9	5.55	33.8
2021-2022	134	7.98	16.5	5.41	35.2
2023-2025	219	8.98	19.7	7.06	31.6

2023+ vs 2008-2022 deltas: Run Rate +14.2%, Boundary% +23.1%, Six-hitting +49.6%, Dot Ball% -10.0%

What Caused the Break

Impact Player rule (2023) — 12 effective players per side, inflating batting depth and scoring rates
2022 mega auction reset — team compositions fundamentally reshuffled, pre-2023 team context obsolete
Evolved batting intent — six-hitting up 49.6%, batters attacking from ball one in the modern IPL
219 matches provides sufficient sample for statistical reliability across all analytical views

📄 TKT-181 Review Document (Internal)

🎯

Insight Confidence Framework

Editorial confidence scoring for analytical insights (TKT-094)

Every insight published in the magazine needs a confidence assessment. This framework scores each analytical claim on a 0-100 scale with letter grades:

Scoring Breakdown

Sample Size (40%) — Capped at 300 balls/innings. HIGH (≥300), MEDIUM (≥100), LOW (<100)
Consistency (25%) — Metric stability across sub-samples (first half vs second half of career)
Recency (20%) — How recent the underlying data is (1.0 = all 2025, 0.5 = mix)
Cross-Validation (15%) — Does the insight hold across conditions (home/away, bat/field)?

Grade Boundaries

A (≥85) Publish with confidence
B (≥70) Publish with minor caveats
C (≥55) Add sample size caveat and limitations
D (<55) Do not publish as standalone insight

Status: Framework implemented. Bridges with existing Confidence Intervals (TKT-145). Used editorially for all stat pack claims.

📊

Data Foundation

Ball-by-ball analysis from Cricsheet

Data Sources

Cricsheet Ball-by-Ball - 219 IPL matches (2023-2025)
IPL 2026 Auction Data - Squad compositions and prices
Historical Records - Team vs team, venue performance

Derived Metrics (115+ Views)

80 dual-scope views (_alltime + _since2023 pairs)
Batter consistency index, boundary percentage, dot ball percentage
Partnership synergy scores, pressure sequences
Phase-specific performance (PP, middle, death)
Bowling type matchups, handedness analysis
13 Film Room tactical views (entry points, wicket clusters, bowling changes)

Validation

Every insight reviewed by cricket domain expert (Andy Flower)
8-step quality assurance process
Founder sign-off on all key outputs

SUPER SELECTOR Algorithm

Scoring Components

Hard Constraints

PFF-Inspired Grading

Key Concepts

Cricket Applications

KenPom Efficiency Metrics

Key Concepts

Cricket Applications

Player Clustering (K-Means V2)

Batter Archetypes

Bowler Archetypes

CricPom: Novel Composite Metrics

Core Adjusted Metrics

5-Factor Tournament Quality Engine

How It Differs from Raw Stats

Tournament Quality Weighting

5-Factor Composite Weight

IPL 2023+ Baseline (Founder-Locked)

Tournament Tiers (Provisional)

Dual-Scope Analytics Framework

Architecture

Why 2023+? The Data Evidence (Founder-Approved)

What Caused the Break

Insight Confidence Framework

Scoring Breakdown

Grade Boundaries

Data Foundation

Data Sources

Derived Metrics (115+ Views)

Validation

🏏 The Playbook Rundown

Home — "The Main Event"

Team Breakdowns — "The Dugout"

Artifacts — "The Trophy Cabinet"

Analysis — "Studying Film"

Research — "Pep's Tactical Notebook"

The Film Room — "Breaking Down Tape"

About — "The Origin Story"