Sarvam AI has launched Sarvam-1, a 2 Bn parameter large language model built specifically for Indian languages.
In a blogpost, the startup said that the model is optimised for 10 Indian languages, including Hindi, Bengali, Tamil, and Telugu, besides English.
The model aims to tackle two key challenges – token inefficiency and poor data quality for Indic languages.
Token inefficiency refers to the number of pieces (tokens) a language model needs to break a word into in order to process it. For instance, in English, a word like “apple” might be processed as one token. But in some Indian languages, the same word might get split into 4-8 tokens. This makes processing slower and less efficient.
Sarvam-1 claims to have achieved a token efficiency rate of 1.4-2.1 tokens per word (vs. 4-8 in existing models). It said that the LLM is trained on Sarvam-2T, a 2-trillion-token dataset curated specifically for Indian languages. This ensures better performance in areas like cross-lingual translation and question-answering.
Despite being smaller than models like Meta’s Llama-3.2-3B, Sarvam-1 claims to have outperformed them in several industry benchmarks.
Sarvam-1 is now available for download on Hugging Face.
Earlier on Thursday (October 25), chip giant Nvidia’s CEO Jensen Huang said that the .
Meanwhile, Sarvam AI also announced its partnership with Yotta Data Services. The Sarvam-1 model has been trained on Yotta’s Shakti Cloud infrastructure, the startup said.
Earlier this year, the startup launched its comprising multiple products — Sarvam Agents, Sarvam 2B, Shuka 1.0, Sarvam Models, and A1.
The in its Series A funding round led by Lightspeed Venture Partners, in participation with Peak XV Partners and Khosla Ventures, in December last year.
At the heart of all these is the , which is expected to clock a CAGR of 48% between 2023 and 2030 to become an over $17 Bn opportunity.
The post appeared first on .
You may also like
Jack Jones, Grammy-winning singer of The Love Boat theme, has died aged 86
NYU family weekend: Will there be anyone for Barron Trump?
General Catalyst Raises $8 Bn To Back Startups Across The Globe
Duncan Bannatyne's HUGE donation to Florrie at Pride of Britain Awards revealed
Madhabi Buch skips PAC meet, Rahul Gandhi asks 'who's behind plan to shield her'