In this session, we will explore how we evaluated the translation quality of Google’s Gemma model using the MQM framework and a human-in-the-loop review process.
The case study walks through how LLM-generated translations were assessed using structured error typology, how linguistic quality was benchmarked, and how AI-enhanced workflows can combine automated generation with professional post-editing and evaluation.
We’ll discuss:
How MQM works in real-world AI evaluation
What kinds of errors LLMs produce across languages
Where AI performs well — and where it still struggles
How to design scalable human-in-the-loop evaluation workflows
What this means for localization vendors and enterprise buyers
The session is based on a real case study conducted by Alconost’s MT evaluation team using our MQM evaluation tool.
Full case:
https://alconost.mt/mqm-tool/case-studies/translategemma/
In this episode of Nimdzi Live, we dive into the world of cultural intelligence and next-generation localization with Jordan Koziol-Repia of Kiwa Digital, a...
In this episode of Nimdzi Live, Alexander Murauski and Denis Zhilko discuss why using raw machine translation (MT) for localization can be a risky...
Scaling up an organization inevitably reaches an inflection point where founding teams can no longer fuel customer acquisition alone. Yet constructing a sales engine...