Live Linguist

Version 2 · · on-device, Apple silicon

Easy-to-read live captions, now in twenty languages.

Live Linguist is a real-time, on-device caption simplifier. It rewrites live speech into easy-to-read language in the same language — German into simpler German, not into English. v2 takes the same idea from four languages to twenty.

macOS app · Apple silicon · v1.0.19 · 14 MB .dmg — downloads directly, no GitHub account needed.

What’s new in v2

  • 20 languages, up from 4 — each in its official easy-language register, from German Leichte Sprache to Finnish Selkokieli and Japanese Yasashii Nihongo.
  • One multilingual fine-tune per size: Qwen3 0.6B (live) and Qwen3 1.7B (quality), 4-bit.
  • Retrained on a ~57,000-pair supervised dataset spanning 21 locales, evaluated on held-out tests at ~200 sentences each.
  • The fine-tuned models beat stock Qwen3 on SARI and register compliance across the board; 20 of 21 locales clear the ship gate.
  • Still fully on-device on macOS (Apple silicon, MLX) — no cloud, no translation.

Looking for the first release? See Live Linguist v1 (German, French, English, Spanish).

In easy language

This tool listens to people talking. It shows the words on screen. It also writes the words in an easier way.

The easy words are in the same language. German stays German. The tool does not change it to English.

Before, the tool knew four languages. Now it knows twenty languages.

This helps people who are learning a language. It also helps people who find hard words difficult to read.

Everything runs on your own Mac. Your words are not sent away.

Why easy language

Comprehensible input, kept in the target language

We believe in-language simplification is the right kind of help, and the reason is pedagogical. Stephen Krashen’s Input Hypothesis holds that we acquire language through comprehensible input — language we can understand that sits just beyond our current level, often written as “i+1”. Input that is too hard is noise; input that is too easy teaches nothing new.

Easy-language simplification lowers complexity while keeping the learner immersed in the target language. A learner in a German lecture receives German they can actually follow — shorter sentences, common words, one idea at a time — and keeps building acquisition from real target-language input. An English translation would make the sentence comprehensible too, but it removes the German exposure entirely, which is the thing the learner came for.

Translation answers “what did they say?” Simplification answers “what did they say, in words I can learn from?” Only the second keeps the learner inside the language they are trying to acquire.

The same property carries a parallel accessibility value. Easy-language standards — Leichte Sprache, FALC, Lectura Fácil, Selkokieli, and the pan-European Easy-to-Read framework — were developed so that people with cognitive disabilities and low-literacy readers can take part in public and everyday life. A tool that produces compliant easy language serves those readers directly, in their own language. v2 extends that reach to twenty of them.

This is the project’s rationale, grounded in second-language-acquisition theory and established accessibility standards. It is not a clinical claim about learning outcomes.

Origins & acknowledgments

Built for a French–Moroccan virtual exchange

Live Linguist began as the AI component of an AI-supported international virtual exchange between the French program at Kennesaw State University and Université Hassan II Casablanca in Morocco. The idea: let novice second-semester French students hold real, real-time video conversations with more advanced Moroccan peers — without the language gap shutting the conversation down.

The pedagogical brief is exactly the rationale above. Alongside a verbatim transcript, the tool produces a second, simplified transcript that brings higher-proficiency spoken French — up to CEFR C2 — down to a beginner-appropriate CEFR A1–B2 level, in French. That is comprehensible input kept in the target language, not a translation into English — the same principle Live Linguist now applies across all twenty v2 languages.

The exchange and its proposal — “Language Education and Intercultural Connections with AI-Supported International Virtual Exchanges,” submitted to the French in Higher Education Grant Program of the Albertine Foundation — were conceived and written by the language faculty named below. Version 1 generalized that single French use case into a four-language, on-device release; version 2 generalizes it again, to twenty languages on the same model line.

The people who conceived it

The pedagogical concept, curriculum design, and the cross-institutional exchange were developed by:

Kennesaw State University

Department of World Languages and Cultures

  • Dr. Abigail Alexander — Associate Professor of French; Director of the World Languages Resource Collection. Project coordinator; conceived and has led the KSU–Hassan II virtual exchange since fall 2023.
  • Dr. Noëlle Lively — Senior Lecturer of French and Coordinator of French. Curriculum design and exchange coordination.
  • Dr. Federica Santini — Chair, Department of World Languages and Cultures.
  • Brooke Reed — Program Manager, World Languages Resource Collection. Coordinated student testers.

Université Hassan II Casablanca

Casablanca, Morocco

  • Dr. Meriem Hachimi — Associate Professor of French. Co-coordinates the exchange and Moroccan student participation.
  • Dr. Abdelhadi Samadi — Associate Dean for Research and Cooperation. Partner-institution authorization.

Technical & LLM development. Dylan Goldblatt, Ph.D. — AI Strategist and Applied Researcher, KSU Office of Research — designed and built Live Linguist end to end: the fine-tuned easy-language Qwen3 models, the supervised dataset and deterministic register validators, the on-device macOS application, and the CEFR-level caption adaptation. He maintains the models and infrastructure beyond the grant period.

The models

Two specialist simplifiers, now multilingual

v2 keeps the same two-model family but trains a single multilingual fine-tune at each size, so one model covers all twenty languages. We fine-tuned Qwen3 into dedicated easy-language simplifiers and quantized them to 4-bit so they run on-device on Apple silicon via MLX. The size choice is still the latency/quality trade-off: a small one for live captioning, a larger one when quality matters more than speed.

Qwen3 0.6B · live

4-bit · ~331 MB · lowest latency

The everyday driver for real-time captions, where each segment must be simplified inside a tight budget — now across every supported language.

Qwen3-0.6B-EasyLanguage-4bit →

Qwen3 1.7B · quality

4-bit · ~934 MB · highest quality

The higher-fidelity option: stronger SARI and higher register compliance when a few extra milliseconds are acceptable. It is the reference model behind the figures below.

Qwen3-1.7B-EasyLanguage-4bit →

What the effect looks like

A disfluent spoken German sentence, simplified into Leichte Sprache — short sentences, one idea each, filler removed:

Input · spoken German

“also der Termin wurde leider verschoben weil der Arzt krank war”

Output · Leichte Sprache

“Der Termin wurde verschoben. Der Arzt war krank.”

The languages

Twenty languages, each in its own easy-language register

Each language targets an established easy-language register, with its own sentence-length rule and grammar checks. Where a country has a distinct national standard — Germany’s Leichte Sprache, Finland’s Selkokieli, Sweden’s Lättläst, Japan’s Yasashii Nihongo — we use it. Where one is not clearly established, we map to Inclusion Europe’s “Information for All” Easy-to-Read (ETR) framework, the pan-European standard for easy-to-read text, or to ISO 24495-1 plain language as a floor.

The twenty languages enabled in v2, each with its easy-language register and per-sentence length cap. Caps are in words unless marked. CJK and syllable-spaced scripts use a character or syllable cap instead.
Language Register Standard / framework Sentence cap
GermanLeichte SpracheNetzwerk Leichte Sprache; DIN SPEC 33429≤ 12 words
FrenchFALCUNAPEI; European Easy-to-Read≤ 15 words
SpanishLectura FácilUNE 153101:2018 EX; Plena Inclusión≤ 15 words
EnglishEasy / Plain EnglishUS federal plain-language≤ 18 words
ItalianEasy to ReadInclusion Europe ETR (Informazioni per tutti)≤ 12 words
Portuguese (PT)Leitura FácilInclusion Europe ETR; IFLA≤ 15 words
Portuguese (BR)Leitura FácilInclusion Europe ETR≤ 12 words
DutchMakkelijk LezenInclusion Europe ETR≤ 12 words
SwedishLättlästMyndigheten för tillgängliga medier (MTM)≤ 12 words
DanishLetlæstInclusion Europe ETR≤ 12 words
FinnishSelkokieliSelkokeskus≤ 10 words
EstonianLihtsasti loetavInclusion Europe ETR≤ 12 words
SlovakĽahko čitateľnéInclusion Europe ETR≤ 12 words
RussianЛёгкое чтениеInclusion Europe ETR≤ 12 words
TurkishKolay Anlaşılır BilgiInclusion Europe ETR≤ 12 words
ArabicEasy-to-Read (RTL)Inclusion Europe ETR≤ 12 words
HindiEasy-to-ReadInclusion Europe ETR≤ 12 words
Korean읽기 쉬운 정보Inclusion Europe ETR≤ 12 words
VietnamesePlain languageISO 24495-1:2023≤ 14 words
Japaneseやさしい日本語Yasashii Nihongo≤ 40 chars

A 21st locale, Simplified Chinese (zh-CN), was trained and evaluated but is held back from this release — see Results. Standard confidence varies by language: national standards (de, fi, sv, ja) are well established; ETR mappings are our best-supported interpretation where a dominant national easy-language standard is not clearly documented.

Results

Specialist models beat vanilla Qwen3 — in every language

We evaluated on held-out test sets of ~200 sentences per language, comparing each fine-tuned model against the stock Qwen3 4-bit model of the same size. The fine-tuned models win on simplification quality and, more decisively, on reliability: stock Qwen3 frequently ignores the easy-language register or copies a prompt example, while the fine-tuned models follow the rules almost every time. 20 of the 21 trained locales clear the ship gate.

SARI (higher is better)
The standard simplification-quality metric (Xu et al., 2016). It rewards words the model correctly keeps, adds, and deletes relative to references — a single score for “how good is this simplification?”
Register compliance (higher is better)
The share of outputs that actually follow the target register’s rules (sentence length, simple grammar, and the other deterministic checks for that language).
Language ID (LID) (higher is better)
The share of outputs that stay in the target language — the simplification must not drift into another language. The ship gate requires LID ≥ 0.98.
Horizontal bar chart of SARI scores, fine-tuned EasyLanguage versus stock Qwen3-1.7B, for nineteen languages that use whitespace-delimited scripts. The fine-tuned model scores higher in every language. Top scores include Vietnamese 63.2, English 62.8, Spanish 62.4, French 61.5 and Portuguese-PT 60.6, against stock scores in the low-to-mid 50s; German rises from 41.9 to 50.6 and Turkish from 46.4 to 51.3.
Simplification quality (SARI), Qwen3-1.7B. The fine-tuned model scores higher than stock Qwen3 in every language shown. Japanese and Chinese are excluded here because SARI is whitespace-tokenized and meaningless for scripts without spaces between words — those are judged on chrF, compliance, and LID instead.
Horizontal bar chart of register-compliance percentages, fine-tuned versus stock Qwen3-1.7B, across all twenty languages. The fine-tuned model follows the easy-language rules in roughly 80 to 99 percent of outputs, with most languages at 94 percent or higher, while the stock model complies in roughly 40 to 67 percent of outputs.
Register compliance, Qwen3-1.7B. Fine-tuned models follow the easy-language rules far more often than stock Qwen3 in every language — most at 94–99%. The few lower cases (Vietnamese, Italian, Hindi, Portuguese-BR) still clear stock by a wide margin and pass the gate.

Results in numbers

Held-out test, ~200 sentences per language. Fine-tuned (ft) versus stock Qwen3 4-bit, greedy decoding. SARI shown for both sizes; register compliance and LID for the 1.7B model. All twenty languages pass the ship gate.
Lang 0.6B SARI
ft / stock
1.7B SARI
ft / stock
1.7B compliance
ft / stock
1.7B LID
German47.9 / 33.050.6 / 41.90.99 / 0.601.00
French56.9 / 33.361.5 / 54.10.99 / 0.561.00
Spanish59.3 / 36.662.4 / 56.90.97 / 0.611.00
English59.2 / 32.562.8 / 52.00.99 / 0.671.00
Italian53.3 / 49.255.6 / 49.50.82 / 0.551.00
Portuguese (PT)57.4 / 39.060.6 / 54.30.98 / 0.621.00
Portuguese (BR)55.0 / 37.557.9 / 56.90.87 / 0.581.00
Dutch51.9 / 39.054.1 / 51.80.94 / 0.621.00
Swedish53.2 / 40.256.1 / 49.50.97 / 0.600.99
Danish53.1 / 35.656.0 / 49.20.97 / 0.580.99
Finnish50.5 / 35.251.7 / 47.20.99 / 0.591.00
Estonian49.4 / 30.549.9 / 45.60.99 / 0.580.99
Slovak50.9 / 40.651.3 / 46.90.97 / 0.560.99
Russian50.6 / 41.155.6 / 51.00.93 / 0.570.99
Turkish51.3 / 31.751.3 / 46.40.99 / 0.571.00
Arabic 49.2 / 50.152.9 / 51.00.97 / 0.581.00
Hindi52.7 / 35.254.9 / 49.20.84 / 0.550.97
Korean49.5 / 43.851.4 / 46.50.97 / 0.551.00
Vietnamese59.6 / 41.363.2 / 57.30.81 / 0.511.00
Japanese — / —— / —0.96 / 0.411.00

SARI = Xu et al. (2016). Compliance is the validator pass rate; LID is the in-language rate.  Arabic ships on the 1.7B quality model, where it clears stock on SARI, compliance, and LID; at 0.6B its SARI is roughly level with stock.  Japanese SARI is whitespace-tokenized and not meaningful for a script without word spaces (chrF ≈ 46 for the 1.7B ft); it ships on compliance + LID.

Why 20 and not 21. Simplified Chinese (zh-CN) trained and simplifies well (chrF ≈ 54, compliance ≈ 0.91), but its automatic language-ID score lands at 0.96 — just under the 0.98 gate. On inspection, every “failure” is genuine Simplified Chinese that the language detector mislabels as Korean or Vietnamese on short Han-script text — a known weakness of the detector, not a model regression. Per our “safety of ship” rule, zh-CN stays on the stock model until the gate uses a script-aware Chinese check. It is a strong candidate to enable next.

The dataset

How the training data was built

The models were fine-tuned on a supervised dataset of sentence-to-easy-language pairs. v2 expands it from four languages to 21 locales — roughly 57,000 training pairs (56,654 train / 5,691 validation), with ~200 held-out test sentences per language. It combines two sources:

  • Grounding in real simplifications. German pairs are grounded in German4All, a corpus of German Wikipedia text aligned to multiple simplification levels.
  • Synthetic spoken-style data. To cover the live-captioning use case — disfluent, conversational input — and the new languages, we generated pairs in a spoken style with a teacher model (Claude, Anthropic) across everyday domains (lectures, meetings, medical, civic, travel, and more), then quality-controlled them.

Every pair passed the same deterministic validators the application uses at runtime: per-sentence length caps, simple-grammar checks, source anchoring, number fidelity, an anti-parroting check, and a wrong-language guard. Pairs that failed the register rules were rejected, so the training signal reflects the standard rather than just “shorter text.”

live-linguist-easylanguage-sft on Hugging Face →

Run it locally

Everything stays on your Mac

Download Live Linguist for macOS Apple silicon · v1.0.19 · 14 MB .dmg · direct download

Live Linguist is a macOS application that runs the whole pipeline — audio capture, speech recognition, and simplification — on-device on Apple silicon. Nothing is sent to a server. The models above are the simplification stage.

Prefer to build from source, or want the latest unreleased changes? At a high level:

  1. Clone the application repository and follow its README build instructions (macOS 14.4+, Xcode 16+).
  2. Download the easy-language models from Hugging Face (the live 0.6B for low latency, or the quality 1.7B).
  3. Pick a language and an audio source, and read the simplified captions alongside the verbatim transcript.

Model coverage vs. app coverage. All twenty registers are validated and published on the models. The current macOS app routes the four founding languages — German, French, Spanish, English — end to end; the other sixteen are rolling into the app as their speech-recognition locale and language packs are wired up. A language that isn’t yet wired falls back to the stock model, byte-identical to before.

Full, current setup steps live in the repository’s README:

github.com/ngoldbla/live-linguist →

Provenance, licenses & limitations

What this is built on — and what it is not

Provenance & licenses

  • Base model: Qwen3 (0.6B and 1.7B), by Alibaba — Apache-2.0.
  • Source data: German4All — MIT; derived from German Wikipedia, which is CC BY-SA.
  • Synthetic teacher data: generated with Claude (Anthropic). Review the relevant model and data licenses before redistribution.
  • Standards: easy-language registers follow national standards where established (Leichte Sprache, Selkokieli, Lättläst, Yasashii Nihongo, …) and Inclusion Europe’s “Information for All” Easy-to-Read framework or ISO 24495-1 plain language otherwise.

Limitations — please read

  • Twenty languages, varying maturity. v2 covers twenty languages, but data volume, standard confidence, and compliance differ across them. The four founding languages are the most mature.
  • Small models. These are 0.6B and 1.7B parameter models. They make mistakes, and the smaller one makes more of them.
  • Mostly-synthetic data. Much of the training data is model-generated. It can carry the teacher model’s biases and errors despite validator QC.
  • Standards are an interpretation. For many languages we map to the pan-European Easy-to-Read framework; this is one faithful interpretation, not a certified national standard.
  • Not for high-stakes use. Do not rely on these outputs for medical, legal, financial, or safety-critical communication. They are an aid, not an authority.
  • Easy language is an approximation. Compliance with a register’s rules is measured automatically; it does not guarantee a certified human-reviewed easy-language text.

Roadmap

v2 ships twenty languages on the model side and four in the live app, with the remaining registers being wired into the application next. Simplified Chinese is held pending a script-aware language-ID gate. Additional languages stay on the same model line and easy-language-register approach.