Table of Contents

Module 03 — Pattern-mining your RTI corpus

RTI for Activists & NGOs Module 03

Goal: Convert hundreds of PIO replies into a publishable dataset.

Standardise the response format

PIO replies vary wildly. Before tabulation:

  1. Scan + OCR each reply (most are PDFs / scanned letters).
  2. Tag with metadata: state, district, PIO name, date received, days to reply.
  3. Extract numbers into a spreadsheet column.
  4. Flag any partial or refused responses for follow-up.

Comparative tables that publish well

  1. Compliance speed: median + worst PIO reply time, by state/department.
  2. Disclosure rate: % of sub-questions answered, by state.
  3. Variance: Coefficient of variation across districts for the same metric — high CV = inconsistency / data quality issue.
  4. Outliers: Top 5 + bottom 5 by metric.

Visualisations that travel

  1. Choropleth map (district / state level)
  2. Slope chart for year-on-year change
  3. Heat map for compliance scorecard
  4. Bullet chart for variance vs benchmark

Avoid: pie charts, 3D bar charts (anti-pattern in data journalism).

Open-data publication

Publish the raw RTI responses as an open dataset (CSV + scanned PDFs in a Google Drive). This:

  1. Enables third-party verification
  2. Encourages other journalists to dig deeper
  3. Builds your NGO's credibility as a data source

License: CC BY 4.0. Tag in datasets.gov.in if relevant.

Citing the PIO

In any publication or report, cite each underlying RTI:

  1. PIO designation, department, address
  2. Date of RTI filing + RTI reference no.
  3. Date of PIO reply + PIO reply reference no.

This citation format is what makes the story court-admissible if your evidence is later challenged.

✅ Quiz

Quiz available from your course dashboard.

Next

Last reviewed: 24 April 2026.