Module 03 — Pattern-mining your RTI corpus

Goal: Convert hundreds of PIO replies into a publishable dataset.

Standardise the response format

PIO replies vary wildly. Before tabulation:

Compliance speed: median + worst PIO reply time, by state/department.
Disclosure rate: % of sub-questions answered, by state.
Variance: Coefficient of variation across districts for the same metric — high CV = inconsistency / data quality issue.
Outliers: Top 5 + bottom 5 by metric.

Avoid: pie charts, 3D bar charts (anti-pattern in data journalism).

Publish the raw RTI responses as an open dataset (CSV + scanned PDFs in a Google Drive). This:

License: CC BY 4.0. Tag in datasets.gov.in if relevant.

In any publication or report, cite each underlying RTI:

This citation format is what makes the story court-admissible if your evidence is later challenged.

Quiz available from your course dashboard.

Last reviewed: 24 April 2026.