In this session, we will explore how we evaluated the translation quality of Google’s Gemma model using the MQM framework and a human-in-the-loop review process.
The case study walks through how LLM-generated translations were assessed using structured error typology, how linguistic quality was benchmarked, and how AI-enhanced workflows can combine automated generation with professional post-editing and evaluation.
We’ll discuss:
How MQM works in real-world AI evaluation
What kinds of errors LLMs produce across languages
Where AI performs well — and where it still struggles
How to design scalable human-in-the-loop evaluation workflows
What this means for localization vendors and enterprise buyers
The session is based on a real case study conducted by Alconost’s MT evaluation team using our MQM evaluation tool.
Full case:
https://alconost.mt/mqm-tool/case-studies/translategemma/
More of us are working remotely than ever before. This has profoundly impacted who your colleagues are, where they're from, where they're based, and...
The results are in: TAUS Date-enhanced Machine Translation (DeMT™) improves MT output of the major MT engines with an average of 25% in BLEU...
When it comes to getting the job done, mindset is key. And Sasan Banava knows all about it. Join our next episode with Sasan...