Use case #0001

How Document AI extracts income data from 12 document types automatically

The underwriter who manually opens a 12-month bank statement, locates the salary credits, cross-references them against three payslips, computes the average, and then does the same again for two ITR PDFs is not underwriting — they are doing clerical extraction. The Document Ops Agent AI extracts income data from every document type in the Indian lending stack — payslips, bank statements, ITRs, Form 16s, GST returns, CA certificates, and six more — in under 90 seconds, and presents a single structured income profile to the underwriter. The underwriter's job is to assess the income, not find it.

The underwriter who manually opens a 12-month bank statement, locates the salary credits, cross-references them against three payslips, computes the average, and then does the same again for two ITR PDFs is not underwriting — they are doing clerical extraction. The Document Ops Agent AI extracts income data from every document type in the Indian lending stack — payslips, bank statements, ITRs, Form 16s, GST returns, CA certificates, and six more — in under 90 seconds, and presents a single structured income profile to the underwriter. The underwriter's job is to assess the income, not find it.

Why income extraction is hard — and why it matters that it is done correctly

Income extraction is harder than it appears because income is represented differently in every document type, by every employer, and for every income structure. A payslip from a large private sector employer will have a standardised format. A payslip from a small manufacturing firm may be a hand-typed table in a Word document. A salaried borrower's bank statement will show regular credits labelled "SALARY" from one employer. A self-employed borrower's bank statement will show irregular credits from dozens of counterparties, none labelled as "income." An ITR will show total income after deductions, which is not the same as gross income, which is not the same as income available for EMI servicing after statutory deductions.

Getting income extraction wrong in either direction creates a material problem. Overstating income leads to over-lending — the borrower's actual EMI capacity is less than the sanctioned amount. Understating income leads to under-lending and potential customer loss — the borrower qualifies for more than they were offered, and a competitor who assesses their income correctly will offer more. The Document Ops Agent AI extracts from the source document, applies the correct computation for each document type, and flags the confidence level of each extraction — so the underwriter can give higher scrutiny to low-confidence extractions without spending equal time on every field.

"An underwriter who spends 20 minutes extracting income figures from three documents and 5 minutes assessing them is operating with inverted priorities. The Document AI spends 90 seconds extracting — the underwriter spends the 20 minutes assessing."

The 12 document types and what the Document AI extracts from each

Payslip (salaried)

Formats: PDF, scanned, photo
  • Gross salary — before all deductions · Primary income signal
  • Net take-home — after PF, ESI, TDS, professional tax
  • PF contribution — employer + employee · Income stability proxy
  • Variable components — incentive, bonus, allowances · Flagged separately
  • Employer name — cross-referenced against EPFO for authenticity
→ Income used: net take-home for FOIR · Gross for eligibility ceiling · Variable: excluded if<3-month history

Bank Statement (salaried)

Formats: PDF, Excel, AA pull
  • Salary credits — regular same-source credits classified as salary
  • 12-month average salary credit — primary verification figure
  • Credit consistency — variation across months (flag if >15% variance)
  • NACH debits — existing EMIs extracted and summed for FOIR
  • Average end-of-month balance — liquidity signal
→ Cross-referenced against payslip figure · Discrepancy >10% flagged for underwriter

Bank Statement (self-employed)

Formats: PDF, Excel, AA pull
  • Total gross credits (12 months) — all inward credits summed
  • Business-to-business credits — identified by credit narrative pattern
  • Cash deposits — flagged separately · Not counted as verifiable income
  • Monthly credit trend — growing, stable, or declining classification
  • Bounce rate — outbound NACH / cheque return rate
→ Average monthly banking income = total credits ÷ 12 · Cash deposits excluded · Bounce rate to Bank Statement Analyst AI

ITR (Individual) — Salaried

Formats: PDF acknowledgement, Form 26AS
  • Gross total income (Schedule S) — salary income before Chapter VI-A deductions
  • Taxable income — after deductions · Not income for lending purposes
  • TDS deducted — cross-referenced for consistency with payslip
  • Year of assessment — currency check · ITR >18 months old flagged
  • Acknowledgement number — verified against ITD portal
→ Income used: gross total income from Schedule S · Not taxable income · AY must be current or prior year only

ITR (Business) — SE / Proprietor

Formats: Profit & Loss schedule, business income
  • Net profit after tax — primary income signal for business borrowers
  • Gross receipts / turnover — top-line for revenue trend
  • Depreciation add-back — non-cash deduction added back for cash income
  • Year-on-year trend — 2-year comparison for stability
  • Business nature — from ITR filing category
→ Income = net profit + depreciation add-back · 2-year average for stability · Declining trend flagged

Form 16 / TDS Certificate

Formats: PDF, Part A + Part B
  • Gross salary (Part B) — employer-certified income figure
  • TAN of employer — verifies employer identity
  • TDS amount — cross-referenced with ITR
  • Assessment year — must be current or prior AY
  • Allowances breakup — HRA, LTA, special allowances separated
→ Considered most reliable salaried income document · Employer-certified · Cross-check vs payslip and bank statement

GST Returns (GSTR-3B)

Formats: PDF download, GST portal pull
  • Outward taxable supply (Table 3.1a) — monthly turnover signal
  • 12-month turnover total — annual revenue for income estimation
  • Filing regularity — late filings counted and flagged
  • Tax paid (cash ledger) — cross-reference for turnover authenticity
  • GSTIN status — active / suspended verified
→ Income estimate: turnover × industry net margin (sector-specific table) · Not direct income · Corroborating signal only

CA Certificate

Formats: Signed PDF on letterhead
  • Net annual income certified — primary figure from CA
  • CA membership number — verified against ICAI register
  • Income computation basis — bank statement / books / estimation
  • Certification date — must be within 3 months of application
  • Business nature certified — confirmed type of SE activity
→ ICAI number verified at extraction · Certificate >3 months: flagged · Basis of computation reviewed

Salary Certificate (employer)

Formats: Letterhead PDF, email confirmation
  • Gross monthly salary — employer-stated figure
  • Employment designation and date of joining — tenure signal
  • HR signatory name and designation — authenticity signal
  • Company letterhead — validated against company registration
  • Issue date — must be within 2 months of application
→ Weaker than Form 16 but stronger than payslip alone · Accepted when Form 16 not yet issued (new joiner)

Rental Income Agreement

Formats: Registered / notarised PDF
  • Monthly rental amount — from agreement schedule
  • Lease start and end date — tenure remaining
  • Property address — cross-referenced with ownership documents
  • Rental income credited to bank — bank statement corroboration required
  • Registration status — registered agreements given higher weight
→ Counted at 70% of stated rental (vacancy/interruption discount) · Must be corroborated by bank credits

Pension Payment Order

Formats: Government PPO document
  • Monthly pension amount — fixed, regular, government-backed
  • PPO number — verified against pension authority records
  • Pension type — service pension vs family pension
  • DCRG / commutation status — lump-sum already received or not
  • Bank account linked — cross-referenced with application account
→ Treated as most stable income signal · Government pension: no variability · 100% counted toward FOIR

Udyam / MSME Registration Certificate

Formats: Udyam portal PDF
  • Business category — Micro, Small, or Medium
  • Date of registration — business vintage signal
  • NIC code — business nature for sector risk classification
  • Turnover declared at registration — corroborating revenue signal
  • PAN linkage — verified against borrower PAN
→ Income not extracted directly — used as business vintage and identity verification · Corroborates GST and ITR

A live extraction: what the Document AI produces in 87 seconds

Income Extraction — Application LA-2025-9841 · Ananya Krishnamurthy · Home Loan
Documents received: 6 · Extraction time: 87 seconds · Nov 14, 2025 · 10:08:22
Applicant typeSalaried — private sector
EmployerInfosys BPM Limited
Documents submitted6 of 6 required
Documents extracted6 of 6 · No extraction failures
Income sourcesSalary · Rental (1 property)
Extraction confidence94.2% average
Extracted income fields — all 6 documents
Gross salary (payslip, 3 months avg)₹1,18,400/monthConf: 97%
Net take-home (payslip)₹88,200/monthConf: 97%
Gross salary (Form 16, AY 2024-25)₹13,98,000/year · ₹1,16,500/monthConf: 99%
Salary credits (bank statement, 12M avg)₹88,040/monthConf: 98%
NACH debits extracted (bank stmt)₹22,500/month (1 existing loan)Conf: 95%
Rental income (agreement)₹18,000/month · Corroborated in bank: ₹17,800 avgConf: 91%
Rental income — applied (70% of stated)₹12,600/monthPolicy discount applied
Payslip vs Form 16 discrepancy+₹1,900/month — within 2% tolerance · No flagAuto-reconciled
ITR AY 2024-25 gross income₹14,28,000/year · ₹1,19,000/monthConf: 99%
Qualifying monthly income for FOIR
₹1,00,800/month
Salary ₹88,200 + Rental ₹12,600
Existing EMI (extracted)
₹22,500/month
Available FOIR headroom: 40.3% at ₹40L · 20yr
● 87s extraction · 94.2% avg confidence · 6/6 documents extracted · Qualifying income: ₹1,00,800/month · FOIR headroom confirmed for requested ticket
12Document types extracted — from payslips and Form 16 to GST returns and pension orders
87sFull extraction for Ananya — 6 documents, 9 income fields extracted, 94.2% average confidence
₹1,00,800Qualifying monthly income produced automatically — salary ₹88,200 + rental ₹12,600 (at 70% policy rate)
ZeroManual data entry in the extraction chain — underwriter receives structured data, not raw documents to read

The underwriter's value is in what they do with the income figure — not in locating it

Income extraction is not judgement — it is transcription with rules. The rules are known: gross from Schedule S not taxable income, rental at 70% discount, bank statement average over 12 months not the best month, NACH debits as existing obligations. Every one of those rules can be applied consistently by a machine, and inconsistently by a human who is extracting their fifteenth application of the day. The Document Ops Agent AI applies the extraction rules identically for every document, every applicant, every time — and presents the underwriter with a structured income profile whose provenance is documented, whose confidence level is stated, and whose discrepancies with other documents are already flagged. What remains for the underwriter is the judgement that actually requires one.

← Back to Document Ops Agent AI