In this session, we will explore how we evaluated the translation quality of Google’s Gemma model using the MQM framework and a human-in-the-loop review process.
The case study walks through how LLM-generated translations were assessed using structured error typology, how linguistic quality was benchmarked, and how AI-enhanced workflows can combine automated generation with professional post-editing and evaluation.
We’ll discuss:
How MQM works in real-world AI evaluation
What kinds of errors LLMs produce across languages
Where AI performs well — and where it still struggles
How to design scalable human-in-the-loop evaluation workflows
What this means for localization vendors and enterprise buyers
The session is based on a real case study conducted by Alconost’s MT evaluation team using our MQM evaluation tool.
Full case:
https://alconost.mt/mqm-tool/case-studies/translategemma/
Join host Tucker Johnson, Lingoport’s Adam Asnes and Olivier Libouban as they explore innovative approaches to software localization that bridge the gap between design...
Design-stage localization is a powerful way to continuously release fully localized products like mobile apps, web apps, and games. Because just like localization, design...
Join us in this new episode of Nimdzi Live as we walk through memoQ's most recent investment: memoQ RFP, Inc. A U.S.-based corporation that...