In this session, we will explore how we evaluated the translation quality of Google’s Gemma model using the MQM framework and a human-in-the-loop review process.
The case study walks through how LLM-generated translations were assessed using structured error typology, how linguistic quality was benchmarked, and how AI-enhanced workflows can combine automated generation with professional post-editing and evaluation.
We’ll discuss:
How MQM works in real-world AI evaluation
What kinds of errors LLMs produce across languages
Where AI performs well — and where it still struggles
How to design scalable human-in-the-loop evaluation workflows
What this means for localization vendors and enterprise buyers
The session is based on a real case study conducted by Alconost’s MT evaluation team using our MQM evaluation tool.
Full case:
https://alconost.mt/mqm-tool/case-studies/translategemma/
POP UP LIVE EVENT with Tucker Johnson and John E. Flannery. Sales is not magic. Sales is a process. Today we talk with John...
Unlock the potential of AI in translation workflows for faster, more efficient, and error-free results by learning to customize and train AI models to...
Design-stage localization is a powerful way to continuously release fully localized products like mobile apps, web apps, and games. Because just like localization, design...