Build Small Hackathon ยท Backyard AI ยท Ancient Tamil Wisdom (Thirukkural) Without Borders
๐ช Live app: https://huggingface.co/spaces/build-small-hackathon/ancient-tamil-wisdom-in-25-languages
My grandmother can recite Thirukkural couplets from memory, but can't read screens of English commentary. My niece reads English fluently โ but not Tamil script. A friend in Jakarta has never read the Kural in any language he speaks.
The Thirukkural is one of humanity's oldest works of practical ethics: 1,330 Tamil couplets on virtue, leadership, and love, written by Thiruvalluvar over 2,000 years ago. It's studied by millions โ yet it stays locked away behind three walls: language, literacy, and the absence of a voice.
For the Backyard AI track, I wanted to knock all three down for the people I actually know โ and then open it to everyone.
One app, every couplet:
The hackathon's rule โ every model under 32B โ turned out to be a design philosophy, not a constraint. Instead of one giant model, I used the right small model for each job:
| Job | Model | Size |
|---|---|---|
| Commentary, translation, the council chat | NVIDIA Nemotron-Nano-9B-v2 | 9B (128k ctx) |
| Agentic orchestration | NVIDIA NeMo Agent Toolkit (NAT) | โ |
| Tamil / Indic / English voice | AI4Bharat Indic Parler-TTS | 0.9B |
| 23 other-language voices | Chatterbox Multilingual | 0.5B |
| Fallback voice (any language) | Meta MMS-TTS | ~70M |
Built and tuned on an NVIDIA DGX Spark (GB10, FP8, vLLM); the public demo serves the same open models on Modal GPUs, with a CPU-only Gradio/React Space in front.
The "Council of Valluvar" became NAT โ the Nemotron Agent Trio, built on the NVIDIA NeMo Agent Toolkit. Three agent personas reason over the same question concurrently and their voices are composed into one clear answer:
It's a small, deterministic multi-agent workflow โ exactly the kind of thing the toolkit is good at โ and it makes a 9B model feel like a study circle.
The honest part. Most of my time went here, not on the happy path.
1. Context window is a feature, not a footnote. My first serving choice (a 4B Nemotron) had only a 4,096-token context โ far too small for essay-length commentary plus translation. Symptoms looked like "translation is broken"; the real cause was truncation. Switching to Nemotron-Nano-9B-v2 (128k) fixed it instantly.
2. Reasoning traces are expensive. Nemotron reasons by default. For reader-facing prose that doubled latency for no benefit. Disabling it via /no_think roughly halved generation time โ and I kept a </think>-stripping safety net for the cases where the trace leaks through without an opening tag.
3. Small models truncate JSON โ so salvage it. Under load the 9B occasionally overran the token budget and returned a JSON object cut off mid-string. Instead of rejecting it (a 502), I wrote a salvage parser that extracts every complete "key": "value" pair. Truncated tails stopped costing the user the whole response.
4. Cold start, not GPU tier, is what feels "slow." I was tempted to throw an H100 at slow audio. But these TTS models are 0.5โ0.9B and latency-bound by their autoregressive loop โ H100 barely beats L40S. The real culprit was scale-to-zero cold starts. Keeping a warm container (or a 10-minute warm window) made everything feel instant; the GPU tier was almost irrelevant. Lesson: profile the cause before buying compute.
5. There is no NVIDIA Tamil voice โ and that's the whole app. NVIDIA's Magpie/Chatterbox cover ~9โ23 languages beautifully, but none speak Tamil. For a Tamil-first app that was disqualifying. The answer was a hybrid: AI4Bharat Indic Parler-TTS for Tamil/Indic/English, Chatterbox for the 23 it does cover, MMS as the universal fallback โ routed per language.
6. Pace makes "clarity." Raw TTS runs sentences together. Synthesizing sentence-by-sentence with deliberate pauses (and edge fades) did more for perceived clarity than any model swap.
7. Lazy generation + caching beats pre-computing everything. Each (Kural, language) pair is generated once on first request and cached forever. Warm-up is the only cost; every later visitor is instant.
/gradio via gr.mount_gradio_app), with a custom React front-end on top (an Off-Brand take)๐ช Live app: https://huggingface.co/spaces/build-small-hackathon/ancient-tamil-wisdom-in-25-languages
Open a Kural, read the commentary, pick a language, and press Listen โ then ask NAT anything.
| Asset | Link |
|---|---|
| ๐ฌ App demo video | https://www.youtube.com/watch?v=ubRxpqqMsJY |
| ๐ฌ Testimonials (Arabic ยท Punjabi ยท Telugu) | https://youtu.be/YidPYUwVOAs |
| ๐ผ LinkedIn post | https://www.linkedin.com/posts/sudbharathi_ancient-tamil-wisdom-thirukkural-without-activity-7471897773647519744-vTSA |
| ๐ X post | https://x.com/bsudharsh/status/2066052936854094068 |
Small models, big wisdom.
Built with NVIDIA Nemotron, the NeMo Agent Toolkit, AI4Bharat Indic Parler-TTS, Chatterbox, Meta MMS-TTS, on NVIDIA DGX Spark + Modal. #BuildSmallHackathon