The IPL Player Detection Dataset Is Now on HuggingFace

Published Jun 2, 2026 · Updated May 30, 2026 · 3 min read

Dataset: huggingface.co/datasets/goyaljai/IPL-Player-Detection-IITB-PML — 1,005 IPL broadcast images, team grid annotations, player counts. Also on Kaggle.

I originally posted the IPL broadcast annotation dataset on Kaggle for the computer vision community. Kaggle is great for notebooks and competitions. But if you just want to pull a dataset into a Python script and start experimenting, the HuggingFace datasets library is the standard workflow — so I published it there too.

What’s in the dataset

1,005 images from IPL broadcast footage — actual match frames, not press photos. 800×600px JPGs. Each image has an 8×8 grid annotation (64 cells labeled 0 = empty, or 1–10 = IPL team ID), a player count (0–20), and a train/test flag.

Team IDs: CSK (1), DC (2), GT (3), KKR (4), LSG (5), MI (6), PBKS (7), RR (8), RCB (9), SRH (10). The annotation methodology and why I chose a grid over bounding boxes is in my earlier post on the Kaggle release.

Loading it

from datasets import load_dataset

ds    = load_dataset("goyaljai/IPL-Player-Detection-IITB-PML")
train = ds["train"]    # 793 examples
test  = ds["test"]     # 212 examples

# Each example: image (PIL), c01-c64 (int), count (int)
example = train[0]
print(example["count"])          # player count in this frame
print(example["c01"])            # top-left grid cell — team ID or 0

The image field is a PIL Image — pass it directly to your transform pipeline without intermediate file I/O. Grid labels are clean integers. Arrow format means fast random access across the full 1,005 examples.

Why broadcast footage matters for cricket ML

Most public cricket datasets are curated from press photography: clean lighting, staged poses, single player in frame. Models trained on it don’t generalize to broadcast conditions — partial occlusions, motion blur, multiple players in complex formations, score tickers in the corners.

This dataset is broadcast footage. The spatial grid annotations open tasks that curated datasets don’t support: formation analysis, spatial clustering by team, shot-type classification from player positions, team distribution modeling across different field configurations.

Suggested use cases

Multi-label team classification from a broadcast frame
Player count regression under real broadcast conditions
Spatial team distribution modeling across shot types
Few-shot learning benchmark for sports imagery
Data augmentation base for IPL computer vision tasks

If you build something with it — a notebook, a model, a paper — I’d genuinely like to know. Dataset at huggingface.co/datasets/goyaljai/IPL-Player-Detection-IITB-PML.

Frequently Asked Questions

How do I load the IPL dataset from HuggingFace?

from datasets import load_dataset; ds = load_dataset(‘goyaljai/IPL-Player-Detection-IITB-PML’). The train split has 793 examples, test has 212. Each example includes an image field (PIL Image), c01–c64 grid labels, and a player count.

What is the IPL-Player-Detection-IITB-PML dataset?

A cricket player detection dataset from IPL broadcast footage built at IIT Bombay for a Probabilistic Machine Learning course. 1,005 images with 8×8 team grid annotations and player counts across 10 IPL teams.

What is different about broadcast footage vs press photos for cricket ML?

Broadcast footage has real conditions: motion blur, partial occlusions, broadcast overlays, variable lighting, crowds. Models trained on press photos don’t generalize to this. This dataset captures real match complexity.

Is this dataset free to use for research?

Yes. The dataset is openly available on HuggingFace at huggingface.co/datasets/goyaljai/IPL-Player-Detection-IITB-PML and on Kaggle. No competition entry or account required.

Find more of my work:

GitHub
HuggingFace
Kaggle
npm
chief-of-staff
claude-echolog
jaika
dev-emulator

The IPL Player Detection Dataset Is Now on HuggingFace — Load It in One Line

What’s in the dataset

Loading it

Why broadcast footage matters for cricket ML

Suggested use cases

Frequently Asked Questions

Comments

Leave a Reply Cancel reply

More posts

The IPL Player Detection Dataset Is Now on HuggingFace — Load It in One Line

Uruguay se queda en cero ante Argelia: Darwin no aparece, Valverde queda señalado y Bielsa ya tiene una alarma rumbo al Mundial 2026

Group D Is a Trap: Pulisic’s USA Can Survive 2026, But Turkiye Are the Real Threat

Caicedo Just Put Germany on Notice: Ecuador’s 1-1 Netherlands Warning Shot Was Loud