A credit model variable that is correlated with a protected characteristic is a proxy variable — and using a proxy variable in a credit model produces discrimination that is statistically indistinguishable from using the protected characteristic directly. The Fair Lending AI tests every model variable for proxy correlation monthly. The pincode is the most common offender — and the hardest to detect without systematic analysis.
Why Proxy Variables Are the Primary Source of Algorithmic Discrimination in India
Direct discrimination by protected characteristic is legally and morally unacceptable, and no regulated institution would consciously design it into a credit model. But proxy variables introduce discrimination through the back door — and they do so in ways that are genuinely difficult to detect without running the right tests.
India's social geography is particularly prone to proxy variable problems. Pincodes are not neutral zip codes — they carry significant social information. In Mumbai, certain pincodes are strongly associated with particular religious communities. In Chennai, pincodes correlate with caste geography. In Bengaluru, the distinction between established localities and newer development areas correlates strongly with migrant status and economic origin. A credit model that uses pincode as a variable — whether for property value adjustment, local employment stability, or historical loan performance — is potentially encoding these social correlations into its credit decisions.
The same logic applies to other seemingly neutral variables: employer type (correlates with caste and community in certain sectors), educational institution attended (correlates with socioeconomic origin), business sector (correlates with community in traditional trading communities), and even mobile network operator in some geographies (correlates with language and community). The Fair Lending AI tests all of them.
The Pincode Proxy Correlation Analysis
| Pincode / Area | Approval Rate in Model | Corr. with Religion (proxy) | Corr. with Community Income | Model Impact | Proxy Status |
|---|---|---|---|---|---|
| 400008 / Bhendi Bazaar | 41.2% | r = 0.89 | r = 0.62 | Pincode adds −18 score points vs. adjacent 400001 | High proxy risk |
| 400012 / Mahim | 44.8% | r = 0.74 | r = 0.58 | Pincode adds −14 score points vs. matched-income adjacent areas | High proxy risk |
| 400097 / Govandi | 51.3% | r = 0.52 | r = 0.71 | Lower approval primarily income-explained (r = 0.71 with income quartile) | Monitor — income may justify |
| 400050 / Bandra West | 72.1% | r = 0.18 | r = 0.84 | High approval rate primarily income-explained — low proxy correlation | Clean |
| 400070 / Kurla East | 54.2% | r = 0.23 | r = 0.68 | Below-average approval primarily income/employment mix — acceptable | Clean |
The Four Variable Categories the AI Tests for Proxy Correlation
Pincode, District, Branch, Property Location
Geographic variables are the most common proxy for religion, caste, and community in India because residential segregation along these lines is historically documented and statistically measurable. Any geographic variable — whether used for property market adjustment, employment stability estimation, or historical NPA rates — must be tested against religious and community composition data for the geography.
→ Remediation: Replace pincode with income-quartile of pincode + property market index (income-adjusted)Employer Name, Industry Sector, Business Type
Certain industry sectors and business types correlate with community in India — jewellery, textiles, and trading businesses are associated with specific communities in different geographies. A model that assigns different risk weights to business sector without correcting for this proxy correlation may be systematically penalising members of those communities.
→ Test: sector approval rates by community proxy — flag if r > 0.50 after income controlsName Length, Script, or Pattern Features
Name-based features — including name length, script (Devanagari vs Tamil vs Arabic), or suffix patterns — are sometimes used in fraud detection or identity verification models. These features correlate directly with religion and community and should never appear in credit scoring models. The proxy test checks whether any engineered name feature has entered the credit model through the feature engineering pipeline.
→ Zero tolerance: name-derived features in credit models automatically flagged for removalEducational Institution, Language, Vehicle Type
As alternative data sources expand, variables like educational institution attended, language of application, vehicle type (owned), or subscription services create new proxy risks. These correlate with socioeconomic origin, language community, and caste in ways that are not always obvious. The Fair Lending AI tests every new variable added to the model pipeline for proxy correlation before deployment.
→ Pre-deployment proxy test mandatory for all new variables — gate before model updateThe Corrective Action When a Proxy Is Found
Identifying a proxy variable is the beginning of the governance action, not the end. The Fair Lending AI's proxy finding triggers a structured assessment: what is the variable's predictive value for credit risk independent of the proxy correlation, and can the legitimate credit information it carries be captured by a non-discriminatory alternative?
For pincodes correlated with religious community, the answer is typically yes: the information the pincode carries about property market liquidity and local income levels can be captured by replacing the raw pincode with an income-quartile rank of the pincode combined with a property market index — preserving the legitimate predictive content while removing the proxy correlation. For a variable like employer name that carries almost no legitimate credit signal but a high proxy correlation, the answer is removal. For a variable that carries substantial legitimate signal and cannot be replaced, the answer is a disparity monitoring overlay that flags any decisions where the variable produced a disparate outcome, for additional human review.
The Proxy Variable Is Not Evidence of Intent — But It Is Evidence of Effect
No lending institution intends to discriminate by religion when it uses pincode as a credit variable. But if that pincode is associated with a religious community at a correlation of 0.89, the model's use of pincode is producing discrimination that is functionally identical to using religion directly — with the additional problem that it is invisible without the proxy analysis. The Fair Lending AI runs the proxy analysis every month for every variable in every model. The institution that can show this analysis, its findings, and its remediation actions is an institution that takes fair lending seriously as a practice rather than a declaration.
