I spent nine months as the LQA lead on a ride-hailing app localization project covering 14 languages. The translation was done by a reputable agency. The reviewers approved every string. We ran the app through standard functional QA. Everything passed. Then we launched in Japan, Germany, Brazil, and South Korea simultaneously. Within the first 72 hours, we received 2,400+ one-star reviews across four app stores. The translations were accurate. The app was broken.
In Japanese, the confirmation button on the payment screen read something like "please proceed to payment" — which was technically correct but ran 40% wider than the English original and pushed the cancel button off-screen on smaller devices. In German, the address form concatenated street name and number without a space, producing strings like "Hauptstraße123" that the geocoding API couldn't parse. In Brazilian Portuguese, the app used formal Portuguese register throughout, which is fine for contracts but feels cold and robotic in a consumer-facing ride-hailing interface. In Korean, the honorific level was inconsistent — the greeting used polite form (мФե) while the error messages used casual form, creating a jarring experience that users described as "rude."
None of these were translation errors. The translators did their job correctly. The problem was that nobody checked whether the translations worked inside the actual software. That's what LQA is for, and that's the gap I want to talk about.
What LQA actually catches (and what it doesn't)
Localization Quality Assurance is not a second translation review. It's a usability test of your translated content in context. The reviewer isn't looking at strings in a spreadsheet. They're looking at strings inside the running application, on real devices, in the screen layouts your users will actually see. The distinction matters.
Text overflow and truncation: This is the single most common LQA issue across every language I've tested. German text is typically 20–35% longer than English. Russian is 15–30% longer. Japanese and Chinese are more compact per character but use larger font sizes and require more line height. Thai doesn't use spaces between words, which means line-breaking algorithms designed for space-delimited languages handle it unpredictably. Arabic and Hebrew need RTL layout support that many UI frameworks handle inconsistently. If your English UI has buttons, labels, and cards designed with English string lengths in mind, a significant percentage of your translations will overflow their containers. I've seen truncation rates of 30–40% on first-pass localization for German and Russian products. That's not an edge case. That's a third of your UI broken.
Context mismatch: A single English word can mean different things in different UI contexts. "Order" can mean a purchase, a sequence, a command, or a religious order. "Right" can mean correct, direction, entitlement, or a political position. "Balance" can be your account balance, physical equilibrium, or remaining time. The translator sees a string in isolation — "Order: $47.50" — and correctly translates it. But the same word appears in a settings menu as "Sort order" and in a navigation flow as "Place order." If the translation management system doesn't provide context screenshots (and most don't, unless you specifically set them up), the translator produces one translation that's correct for one context and wrong for the others. LQA catches this because the reviewer sees the string in context and can immediately tell when the translation doesn't match the UI element's function.
Gender and grammatical agreement errors: Many languages have grammatical gender. French assigns gender to objects — "the table" is feminine, "the book" is masculine. Spanish does the same. German has three genders. When your UI concatenates strings dynamically ("You have [number] new [message_type]"), the grammar can break if the interpolated variable changes the gender agreement. In French, "Vous avez 1 nouveau message" (masculine singular) becomes "Vous avez 3 nouveaux messages" (masculine plural). But if the message type is feminine — "Vous avez 1 nouvelle notification" — the adjective changes. If your code hard-codes the masculine form and swaps in a feminine noun, you get grammatical errors that French speakers will notice immediately. Russian is worse: the adjective ending for "new" has twelve possible forms depending on the gender, number, and case of the noun it modifies. LQA catches these because the reviewer sees the dynamic strings in different states — singular, plural, zero items, mixed content — and flags the agreement failures that only appear in specific combinations.
The LQA process: what it actually looks like
There's no single industry-standard LQA process, but the approach that works consistently across mobile apps, SaaS platforms, and enterprise software follows a pattern. Here's what I've refined over dozens of projects.
Phase 1 — Linguistic testing. A native speaker navigates the entire application in the target language. They're not looking at the code. They're not looking at the translation files. They're using the app as a user would and flagging anything that reads wrong, looks wrong, or feels wrong. This catches context mismatches, register inconsistencies, and any translation that was technically accurate but doesn't fit the UI context. Typical duration: 2–4 hours per language per platform.
Phase 2 — Visual and layout testing. The same reviewer (or a separate visual QA specialist) checks every screen for text overflow, truncation, misalignment, and RTL issues. They test on the minimum-spec device you support, not the high-end phone your developers use. This is where you catch the button that runs off-screen, the label that overlaps with another element, and the tooltip that gets clipped. Typical duration: 1–2 hours per language per platform.
Phase 3 — Functional testing with localized content. This is the step most teams skip. It involves testing core user flows — sign-up, purchase, search, form submission — with localized input. Can a German user enter an address with special characters (ä, ö, ü, ß) without the form rejecting it? Can a Japanese user search in hiragana, katakana, and kanji and get the same results? Does the date picker work with the local calendar? Does the phone number field accept local formatting? Does the currency display correctly with the right symbol and decimal separator? These aren't translation issues. They're software issues that only surface with localized input. Typical duration: 2–3 hours per language.
Phase 4 — Regression. After fixes are applied, re-test the affected areas to make sure the fix didn't introduce new issues. This is boring but necessary. A fix that shortens an overflowing label on one screen might cause a different label on the same screen to reflow and break. Typical duration: 1–2 hours per language per platform.
UX testing and LQA: same problem, different lens
User Experience testing and LQA overlap significantly, but they ask different questions. UX testing asks: "Does this flow make sense?" LQA asks: "Does this flow make sense in this language?" A UX researcher might find that the checkout flow has too many steps. An LQA reviewer might find that the checkout flow has too many steps because each step displays a translated string that adds 30% more characters than the English version, and the cumulative effect is a significantly longer and more frustrating checkout on mobile devices.
The practical implication is that LQA should be integrated into your UX testing cycle, not run as a separate phase after UX testing is complete. When I run LQA alongside UX testing, the combined defect list is usually 40–60% larger than either test would produce alone, because the linguistic issues and the UX issues compound each other. A button that's 20% too wide because of German text length is a linguistic issue. A button that's 20% too wide and also placed in a position that's hard to reach with one thumb is a UX issue that the translation made worse. If you test them separately, you might fix the text width but not the placement, and the user still has a bad experience.
What skipping LQA actually costs
The ride-hailing app I mentioned earlier: the post-launch LQA cycle took three weeks and required emergency patches across all four app stores. The patches themselves caused regression issues (one of the German fixes broke the French layout because the layout code was shared). Total cost of the post-launch fix: roughly $180,000 in engineering time, translator fees, and delayed marketing campaigns. The pre-launch LQA cycle we skipped would have cost approximately $35,000 and taken two weeks. I've seen this ratio — post-launch fix cost is 3–5x the pre-launch LQA cost — on enough projects that I consider it a reliable rule of thumb.
There's also the reputational cost, which is harder to quantify but more damaging in the long run. Those 2,400 one-star reviews from the first 72 hours? They're still there. App store rankings recover slowly. Users who had a bad first experience rarely give you a second chance, even if you fix the problems. The Japanese blog post titled "this app treats Japanese users as an afterthought" (I'm paraphrasing, but not by much) got 14,000 page views. The fix cost money. The reputation cost is permanent.
Artlangs Translation provides end-to-end LQA across 230+ languages, covering linguistic testing, visual/layout QA, functional testing with localized input, and regression. We test on real devices, in the actual app, catching the issues that translation review alone never will. Because a correct translation that breaks your UI is not a correct translation.
