Use case #0001

GST and UPI Data: How Thin-File AI Builds a Credit Score from Scratch

A first-generation textile trader in Surat has been running a profitable business for six years. She files GST every quarter, receives payments via UPI every day, maintains a savings account with consistent inflows, and has never missed a utility bill. Her CIBIL score is 0 — no credit file exists. A bureau-only underwriting model sees nothing. The Thin-File AI sees six years of financial evidence.

A first-generation textile trader in Surat has been running a profitable business for six years. She files GST every quarter, receives payments via UPI every day, maintains a savings account with consistent inflows, and has never missed a utility bill. Her CIBIL score is 0 — no credit file exists. A bureau-only underwriting model sees nothing. The Thin-File AI sees six years of financial evidence.

The 190 Million Who Are Creditworthy But Invisible

India's formal credit bureau infrastructure covers approximately 450 million individuals — a remarkable achievement in a generation, but still leaving close to 190 million adults outside the system's view. Among those 190 million are some of the most creditworthy borrowers in the country: first-generation entrepreneurs who have never needed formal credit, women who manage household and business finances but are not the primary account holder, MSME operators who run lean cash businesses and have no credit history because they have never defaulted because they have never borrowed.

The conventional response to thin-file applicants is rejection — not because these borrowers are risky, but because the risk measurement infrastructure cannot see them. This is not a risk management decision. It is a data availability decision masquerading as a risk management decision. The Thin-File AI replaces the missing bureau data with something more immediately informative: the actual financial behaviour of the borrower, observable in real time through GST filings, UPI transaction records, and banking patterns that together tell a more complete credit story than a bureau file built on historical borrowing alone.

"A credit score that does not exist is not evidence of poor creditworthiness. It is evidence that the person has never borrowed — which is not the same thing."

The Signal Architecture: 4 Data Categories, 28 Computed Metrics

GST Data

Business Revenue & Compliance Signals — Highest Weight for SE Borrowers

Weight: 32%
GST turnover trend — 8 quarters Filing regularity and timeliness score Input tax credit utilisation ratio Output tax to input tax ratio (profitability proxy) E-way bill volume (business activity level) GST return filing history — completeness B2B vs B2C revenue split (stability indicator) Turnover seasonality coefficient
UPI Transaction Data

Real-Time Cash Flow & Payment Behaviour — Highest Frequency, Current Signal

Weight: 28%
Monthly UPI inflow volume — 12 months UPI inflow consistency (coefficient of variation) Payment regularity score — outgoing commitments Peak-to-trough inflow ratio (cash flow stability) Number of unique payers (customer concentration risk) Outgoing payment frequency and punctuality UPI inflow growth rate — quarter on quarter
Bank Statement Analytics

Savings, Liquidity, and Financial Discipline Signals

Weight: 26%
Average monthly balance — 18 months Balance trend (improving / stable / declining) Savings ratio (balance growth vs inflow ratio) Cheque return or NACH failure history Insurance premium payment regularity Investment deduction regularity (SIP, RD) Rental payment consistency (housing commitment)
Alternative & Contextual Signals

Supporting Signals — Corroboration and Context

Weight: 14%
Utility bill payment regularity (BESCOM, water, telecom) Business vintage (years at current address / registration) Trade reference quality (MSME portal registrations) Property ownership signal (registered deed search) Education level (bureau-correlated, DPDP-consented) Sector stability index (NPA rate in borrower's sector)

The Credit Score Output: What the AI Produces

Thin-File Credit Assessment — Application TF-2025-1184
Self-Employed · Textile Trading · Surat · LAP ₹18L
CIBIL Score N/A No bureau file — first formal credit application
Thin-File Score (TFS) 724 Scale 300–900 · Equivalent to bureau band B+
Recommendation APPROVE ₹14L sanctioned · LTV 72% · Rate 11.4%
Score Drivers — Top 6 Signal Contributions
GST Turnover Trend
₹42L → ₹68L over 8 quarters (+62%)
+Strong
UPI Inflow Consistency
CoV 0.18 — very consistent inflow pattern
+Strong
GST Filing Regularity
Filed on time 23 of 24 quarters
+Good
Bank Balance Trend
Avg ₹2.8L · improving 12-month trend
+Good
Business Vintage
6 years registered · same address
+Moderate
Customer Concentration
Top 3 payers = 58% of UPI inflow
−Minor risk

How the Thin-File Score Is Calibrated Against Actual Default Data

A credit score that is not validated against actual default outcomes is not a credit score — it is a heuristic. The Thin-File AI is trained and calibrated on a dataset of borrowers who were initially thin-file at origination, whose loans have matured sufficiently to generate 90-DPD outcome data, and whose alternative data signals at origination can be retrospectively compared to their repayment behaviour. This outcome-linked calibration is what allows the Thin-File Score to be mapped to a bureau-equivalent risk band rather than existing as an opaque proprietary metric that lenders cannot benchmark.

The calibration shows that thin-file borrowers with a TFS above 700 have a 12-month default rate of 2.8% — broadly comparable to bureau-scored borrowers with a CIBIL score of 700–730. The correlation between GST turnover consistency and subsequent repayment is particularly strong in the self-employed segment: borrowers with 8+ quarters of consistent or growing GST turnover default at 1.9%, compared to 4.8% for those with volatile or declining turnover — a difference that bureau-only underwriting cannot detect because it has no GST data.

28Metrics computed across 4 alternative data categories to produce Thin-File Score
2.8%12-month default rate for TFS 700+ borrowers — comparable to CIBIL 700–730 cohort
32%Weight of GST signals in the model — highest single category for SE borrowers
190MAdults outside formal credit bureau coverage — the addressable thin-file population

Alternative Data Is Not a Concession to Risk — It Is a Correction to a Measurement Gap

The instinct to treat thin-file lending as inherently riskier than bureau-scored lending is understandable but factually incorrect. Thin-file borrowers are not more likely to default — they are less measurable by traditional instruments. The Thin-File AI solves the measurement problem, not the risk problem. A borrower who has demonstrated six years of consistent business cash flow, timely tax compliance, and savings discipline is not a high-risk borrower. She is a low-risk borrower whom the traditional system cannot see. The Thin-File AI makes her visible — and makes the case for her credit, with evidence.

← Back to Thin-File Credit Agent AI