๐Ÿช” Build Small Hackathon ยท Backyard AI ยท Field Notes

Field Notes: Giving the Thirukkural a Voice in 25+ Languages โ€” with Small Models

Build Small Hackathon ยท Backyard AI ยท Ancient Tamil Wisdom (Thirukkural) Without Borders

๐Ÿช” Live app: https://huggingface.co/spaces/build-small-hackathon/ancient-tamil-wisdom-in-25-languages


The person I built this for

My grandmother can recite Thirukkural couplets from memory, but can't read screens of English commentary. My niece reads English fluently โ€” but not Tamil script. A friend in Jakarta has never read the Kural in any language he speaks.

The Thirukkural is one of humanity's oldest works of practical ethics: 1,330 Tamil couplets on virtue, leadership, and love, written by Thiruvalluvar over 2,000 years ago. It's studied by millions โ€” yet it stays locked away behind three walls: language, literacy, and the absence of a voice.

For the Backyard AI track, I wanted to knock all three down for the people I actually know โ€” and then open it to everyone.

What it does

One app, every couplet:

The "think small" thesis: the right small model for each job

The hackathon's rule โ€” every model under 32B โ€” turned out to be a design philosophy, not a constraint. Instead of one giant model, I used the right small model for each job:

Job Model Size
Commentary, translation, the council chat NVIDIA Nemotron-Nano-9B-v2 9B (128k ctx)
Agentic orchestration NVIDIA NeMo Agent Toolkit (NAT) โ€”
Tamil / Indic / English voice AI4Bharat Indic Parler-TTS 0.9B
23 other-language voices Chatterbox Multilingual 0.5B
Fallback voice (any language) Meta MMS-TTS ~70M

Built and tuned on an NVIDIA DGX Spark (GB10, FP8, vLLM); the public demo serves the same open models on Modal GPUs, with a CPU-only Gradio/React Space in front.

NAT โ€” making the chat genuinely agentic

The "Council of Valluvar" became NAT โ€” the Nemotron Agent Trio, built on the NVIDIA NeMo Agent Toolkit. Three agent personas reason over the same question concurrently and their voices are composed into one clear answer:

It's a small, deterministic multi-agent workflow โ€” exactly the kind of thing the toolkit is good at โ€” and it makes a 9B model feel like a study circle.

Field notes โ€” what actually broke, and what I learned

The honest part. Most of my time went here, not on the happy path.

1. Context window is a feature, not a footnote. My first serving choice (a 4B Nemotron) had only a 4,096-token context โ€” far too small for essay-length commentary plus translation. Symptoms looked like "translation is broken"; the real cause was truncation. Switching to Nemotron-Nano-9B-v2 (128k) fixed it instantly.

2. Reasoning traces are expensive. Nemotron reasons by default. For reader-facing prose that doubled latency for no benefit. Disabling it via /no_think roughly halved generation time โ€” and I kept a </think>-stripping safety net for the cases where the trace leaks through without an opening tag.

3. Small models truncate JSON โ€” so salvage it. Under load the 9B occasionally overran the token budget and returned a JSON object cut off mid-string. Instead of rejecting it (a 502), I wrote a salvage parser that extracts every complete "key": "value" pair. Truncated tails stopped costing the user the whole response.

4. Cold start, not GPU tier, is what feels "slow." I was tempted to throw an H100 at slow audio. But these TTS models are 0.5โ€“0.9B and latency-bound by their autoregressive loop โ€” H100 barely beats L40S. The real culprit was scale-to-zero cold starts. Keeping a warm container (or a 10-minute warm window) made everything feel instant; the GPU tier was almost irrelevant. Lesson: profile the cause before buying compute.

5. There is no NVIDIA Tamil voice โ€” and that's the whole app. NVIDIA's Magpie/Chatterbox cover ~9โ€“23 languages beautifully, but none speak Tamil. For a Tamil-first app that was disqualifying. The answer was a hybrid: AI4Bharat Indic Parler-TTS for Tamil/Indic/English, Chatterbox for the 23 it does cover, MMS as the universal fallback โ€” routed per language.

6. Pace makes "clarity." Raw TTS runs sentences together. Synthesizing sentence-by-sentence with deliberate pauses (and edge fades) did more for perceived clarity than any model swap.

7. Lazy generation + caching beats pre-computing everything. Each (Kural, language) pair is generated once on first request and cached forever. Warm-up is the only cost; every later visitor is instant.

Compliance, honestly

Try it

๐Ÿช” Live app: https://huggingface.co/spaces/build-small-hackathon/ancient-tamil-wisdom-in-25-languages

Open a Kural, read the commentary, pick a language, and press Listen โ€” then ask NAT anything.

Demo & links

Asset Link
๐ŸŽฌ App demo video https://www.youtube.com/watch?v=ubRxpqqMsJY
๐Ÿ’ฌ Testimonials (Arabic ยท Punjabi ยท Telugu) https://youtu.be/YidPYUwVOAs
๐Ÿ’ผ LinkedIn post https://www.linkedin.com/posts/sudbharathi_ancient-tamil-wisdom-thirukkural-without-activity-7471897773647519744-vTSA
๐• X post https://x.com/bsudharsh/status/2066052936854094068

Small models, big wisdom.

Built with NVIDIA Nemotron, the NeMo Agent Toolkit, AI4Bharat Indic Parler-TTS, Chatterbox, Meta MMS-TTS, on NVIDIA DGX Spark + Modal. #BuildSmallHackathon