A digital onboarding form is an assumption about the person filling it. It assumes the borrower knows what an IFSC code is. It assumes they can navigate a file upload interface on a low-end Android handset. It assumes they understand what "date of incorporation" means in the context of their proprietorship. Most rural and semi-urban borrowers are fully capable of repaying a loan — the evidence is in their bank statement, their GST returns, and their years of informal credit management with local moneylenders. They are not capable of navigating a form designed by an urban product team for an urban smartphone user. The Multilingual Onboarding Agent AI replaces the form with a conversation — and the conversation is in the borrower's language, at the borrower's pace, with no assumption about what the borrower already knows.
The form vs the conversation — structurally different experiences
A form presents all fields simultaneously and asks the user to know what each field requires. A conversation presents one question at a time, explains what is needed and why, accepts a range of inputs (including voice notes on WhatsApp, where many Tier 2/3 borrowers are more comfortable than with text), validates the input immediately with a friendly error message if it is wrong, and moves to the next question only when the current one is answered correctly. The conversion rate difference between a form and a conversation for the same borrower population is not marginal — it is typically 40 to 60 percentage points. The borrower who abandons the form on field 7 completes the conversation every time if the conversation is in their language.
A live onboarding conversation in Telugu: Ravi Textile Works, Tirupati
The 4 adaptive modes for different literacy levels
conversational
Text-based WhatsApp conversation with one question per message — fastest path
Borrower can read and type comfortably in their language. The agent proceeds with text questions and text inputs, using simple language and avoiding jargon. Documents are uploaded as image attachments. eSign via Aadhaar OTP. Typical completion: 20–25 minutes. This mode covers approximately 55% of Tier 2/3 onboarding sessions.
Borrower responds via voice notes · Agent transcribes and confirms in text before proceeding
The agent's questions are sent as text but the borrower responds via WhatsApp voice notes. The agent transcribes the voice note using regional language ASR (Automatic Speech Recognition), reads back the transcription for confirmation ("I heard 'Ravi Textile Works' — is that correct?"), and proceeds. Voice note responses are also used for complex fields like address — the borrower speaks it, the agent transcribes and formats. Approximately 28% of Tier 2/3 sessions use this mode.
Questions presented as numbered choices · Borrower responds with a number (1, 2, 3) · No reading required
For borrowers who cannot read the regional script but understand spoken questions, the onboarding is conducted at a business correspondent or bank mitra kiosk, where the kiosk operator reads the agent's questions aloud and enters the borrower's spoken responses. The agent formats every question as a numbered choice wherever possible: "What is your business type? 1 = Proprietorship, 2 = Partnership, 3 = Private Limited, 4 = Other." The borrower says "1" and the operator enters "1." Approximately 12% of sessions use this assisted kiosk mode.
enhanced
Live video KYC session in the borrower's language · Agent walks through every step on screen share
For borrowers who prefer a human connection or whose document situation is complex (property papers in regional script, handwritten ledgers), the Multilingual Onboarding Agent AI escalates to a live Video KYC session with a human agent who speaks the borrower's language. The AI has already collected all data it can from the pre-video conversation, so the video session focuses only on what could not be collected digitally. Approximately 5% of sessions escalate to this mode.
What the conversational agent explains that the form never could
The most valuable thing the conversational agent does is not collect information — it is explain concepts. A form labelled "FOIR" means nothing to a borrower in rural Rajasthan. The agent says: "मैं आपके हर महीने की कुल EMI, आपकी महीने की कमाई से देखूँगा — अगर आप ₹30,000 कमाते हैं और ₹8,000 EMI देते हैं, तो आपका FOIR 26.7% है। इस लोन के साथ आपका FOIR 43% हो जाएगा जो हमारी सीमा के अंदर है।" (I'll look at your total monthly EMI compared to your monthly income — if you earn ₹30,000 and pay ₹8,000 EMI, your FOIR is 26.7%. With this loan, it would be 43%, which is within our limit.) The borrower understands the calculation, understands why it matters, and understands they are eligible. This is informed consent — not a check box next to fine print.
Informed consent requires comprehension — and comprehension requires communication in the borrower's language at the borrower's literacy level
The RBI's Digital Lending Framework requires that borrowers receive and understand the Key Fact Statement before signing. A KFS in English, presented as a PDF attached to a WhatsApp message, is technically provided. It is not comprehended by a borrower who reads Telugu and who has never seen a KFS before. The Multilingual Onboarding Agent AI reads the KFS to the borrower, in their language, line by line, confirming understanding at each step: "I've told you the interest rate, the processing fee, and the repayment schedule. Do you have any questions before we continue?" The borrower who asks a question gets an answer in their language. The borrower who says "I understand" has actually understood. This is the standard the RBI expects — and it is the standard the Multilingual Onboarding Agent AI makes achievable at scale.
