In this session, we will explore how we evaluated the translation quality of Google’s Gemma model using the MQM framework and a human-in-the-loop review process.
The case study walks through how LLM-generated translations were assessed using structured error typology, how linguistic quality was benchmarked, and how AI-enhanced workflows can combine automated generation with professional post-editing and evaluation.
We’ll discuss:
How MQM works in real-world AI evaluation
What kinds of errors LLMs produce across languages
Where AI performs well — and where it still struggles
How to design scalable human-in-the-loop evaluation workflows
What this means for localization vendors and enterprise buyers
The session is based on a real case study conducted by Alconost’s MT evaluation team using our MQM evaluation tool.
Full case:
https://alconost.mt/mqm-tool/case-studies/translategemma/
Pop-up LIVE discussion: Findings from the 2021 Globalization Report Card (feat. John Yunker). John Yunker is co-founder of Byte Level Research (www.bytelevel.com) and is...
Join us for a special live podcast episode featuring two leaders in the field of language access: Jace Norton, founder of Maya Bridge, and...
If you are not registered for #LocWorldWide46 Africa, don't worry. THere's still time. Today on Nimdzi LIVE! we talk with Localization World's Anne-Marie Colliander...