Field Notes: Giving the Thirukkural a Voice in 25+ Languages — with Small Models

Build Small Hackathon · Backyard AI · Ancient Tamil Wisdom (Thirukkural) Without Borders

🪔 Live app: https://huggingface.co/spaces/build-small-hackathon/ancient-tamil-wisdom-in-25-languages

The person I built this for

My grandmother can recite Thirukkural couplets from memory, but can't read screens of English commentary. My niece reads English fluently — but not Tamil script. A friend in Jakarta has never read the Kural in any language he speaks.

The Thirukkural is one of humanity's oldest works of practical ethics: 1,330 Tamil couplets on virtue, leadership, and love, written by Thiruvalluvar over 2,000 years ago. It's studied by millions — yet it stays locked away behind three walls: language, literacy, and the absence of a voice.

For the Backyard AI track, I wanted to knock all three down for the people I actually know — and then open it to everyone.

What it does

One app, every couplet:

The original Tamil + transliteration, always visible
A profound modern commentary — meaning, line-by-line walkthrough, deeper reflection, practical examples
On-demand translation into 25+ languages, in native script
Near-human narration of the couplet and its meaning — so you can simply hear it
NAT — a three-persona AI council (Student · Teacher · Guru) you can ask anything

The "think small" thesis: the right small model for each job

The hackathon's rule — every model under 32B — turned out to be a design philosophy, not a constraint. Instead of one giant model, I used the right small model for each job:

Job	Model	Size
Commentary, translation, the council chat	NVIDIA Nemotron-Nano-9B-v2	9B (128k ctx)
Agentic orchestration	NVIDIA NeMo Agent Toolkit (NAT)	—
Tamil / Indic / English voice	AI4Bharat Indic Parler-TTS	0.9B
23 other-language voices	Chatterbox Multilingual	0.5B
Fallback voice (any language)	Meta MMS-TTS	~70M

Built and tuned on an NVIDIA DGX Spark (GB10, FP8, vLLM); the public demo serves the same open models on Modal GPUs, with a CPU-only Gradio/React Space in front.

NAT — making the chat genuinely agentic

The "Council of Valluvar" became NAT — the Nemotron Agent Trio, built on the NVIDIA NeMo Agent Toolkit. Three agent personas reason over the same question concurrently and their voices are composed into one clear answer:

🧑‍🎓 Student frames the beginner's question
👨‍🏫 Teacher explains it practically
🧙 Guru gives the deepest reading

It's a small, deterministic multi-agent workflow — exactly the kind of thing the toolkit is good at — and it makes a 9B model feel like a study circle.

Field notes — what actually broke, and what I learned

The honest part. Most of my time went here, not on the happy path.

1. Context window is a feature, not a footnote. My first serving choice (a 4B Nemotron) had only a 4,096-token context — far too small for essay-length commentary plus translation. Symptoms looked like "translation is broken"; the real cause was truncation. Switching to Nemotron-Nano-9B-v2 (128k) fixed it instantly.

2. Reasoning traces are expensive. Nemotron reasons by default. For reader-facing prose that doubled latency for no benefit. Disabling it via /no_think roughly halved generation time — and I kept a </think>-stripping safety net for the cases where the trace leaks through without an opening tag.

3. Small models truncate JSON — so salvage it. Under load the 9B occasionally overran the token budget and returned a JSON object cut off mid-string. Instead of rejecting it (a 502), I wrote a salvage parser that extracts every complete "key": "value" pair. Truncated tails stopped costing the user the whole response.

4. Cold start, not GPU tier, is what feels "slow." I was tempted to throw an H100 at slow audio. But these TTS models are 0.5–0.9B and latency-bound by their autoregressive loop — H100 barely beats L40S. The real culprit was scale-to-zero cold starts. Keeping a warm container (or a 10-minute warm window) made everything feel instant; the GPU tier was almost irrelevant. Lesson: profile the cause before buying compute.

5. There is no NVIDIA Tamil voice — and that's the whole app. NVIDIA's Magpie/Chatterbox cover ~9–23 languages beautifully, but none speak Tamil. For a Tamil-first app that was disqualifying. The answer was a hybrid: AI4Bharat Indic Parler-TTS for Tamil/Indic/English, Chatterbox for the 23 it does cover, MMS as the universal fallback — routed per language.

6. Pace makes "clarity." Raw TTS runs sentences together. Synthesizing sentence-by-sentence with deliberate pauses (and edge fades) did more for perceived clarity than any model swap.

7. Lazy generation + caching beats pre-computing everything. Each (Kural, language) pair is generated once on first request and cached forever. Warm-up is the only cost; every later visitor is instant.

Compliance, honestly

✅ Every model ≤ 32B and open-weight
✅ A Gradio app (mounted at /gradio via gr.mount_gradio_app), with a custom React front-end on top (an Off-Brand take)
✅ Hosted as a Space under the hackathon org
✅ Demo video + this post

Try it

🪔 Live app: https://huggingface.co/spaces/build-small-hackathon/ancient-tamil-wisdom-in-25-languages

Open a Kural, read the commentary, pick a language, and press Listen — then ask NAT anything.

Demo & links

Asset	Link
🎬 App demo video	https://www.youtube.com/watch?v=ubRxpqqMsJY
💬 Testimonials (Arabic · Punjabi · Telugu)	https://youtu.be/YidPYUwVOAs
💼 LinkedIn post	https://www.linkedin.com/posts/sudbharathi_ancient-tamil-wisdom-thirukkural-without-activity-7471897773647519744-vTSA
𝕏 X post	https://x.com/bsudharsh/status/2066052936854094068

Small models, big wisdom.

Built with NVIDIA Nemotron, the NeMo Agent Toolkit, AI4Bharat Indic Parler-TTS, Chatterbox, Meta MMS-TTS, on NVIDIA DGX Spark + Modal. #BuildSmallHackathon