Module 03 — Pattern-mining your RTI corpus
Goal: Convert hundreds of PIO replies into a publishable dataset.
PIO replies vary wildly. Before tabulation:
Scan + OCR each reply (most are PDFs / scanned letters).
Tag with metadata: state, district, PIO name, date received, days to reply.
Extract numbers into a spreadsheet column.
Flag any partial or refused responses for follow-up.
Comparative tables that publish well
Compliance speed: median + worst PIO reply time, by state/department.
Disclosure rate: % of sub-questions answered, by state.
Variance: Coefficient of variation across districts for the same metric — high CV = inconsistency / data quality issue.
Outliers: Top 5 + bottom 5 by metric.
Visualisations that travel
Choropleth map (district / state level)
Slope chart for year-on-year change
Heat map for compliance scorecard
Bullet chart for variance vs benchmark
Avoid: pie charts, 3D bar charts (anti-pattern in data journalism).
Open-data publication
Publish the raw RTI responses as an open dataset (CSV + scanned PDFs in a Google Drive). This:
Enables third-party verification
Encourages other journalists to dig deeper
Builds your NGO's credibility as a data source
License: CC BY 4.0. Tag in datasets.gov.in if relevant.
Citing the PIO
In any publication or report, cite each underlying RTI:
PIO designation, department, address
Date of RTI filing + RTI reference no.
Date of PIO reply + PIO reply reference no.
This citation format is what makes the story court-admissible if your evidence is later challenged.
✅ Quiz
Next
Last reviewed: 24 April 2026.