Live Linguist

Version 1 · · on-device, Apple silicon

Live captions, rewritten into easy-to-read language.

Live Linguist is a real-time, on-device caption simplifier. It rewrites live speech into easy-to-read language in the same language — German into simpler German, not into English.

macOS app · Apple silicon · v1.0.19 · 14 MB .dmg — downloads directly, no GitHub account needed.

What’s in v1

  • Two fine-tuned simplifiers: Qwen3 0.6B (live) and Qwen3 1.7B (quality), 4-bit.
  • Four languages, each in its official easy-language register: German, French, English, Spanish.
  • Held-out evaluation: the fine-tuned models beat stock Qwen3 on every language and metric.
  • Runs fully on-device on macOS (Apple silicon, MLX) — no cloud, no translation.

In easy language

This tool listens to people talking. It shows the words on screen. It also writes the words in an easier way.

The easy words are in the same language. German stays German. The tool does not change it to English.

This helps people who are learning a language. It also helps people who find hard words difficult to read.

Everything runs on your own Mac. Your words are not sent away.

Why easy language

Comprehensible input, kept in the target language

We believe in-language simplification is the right kind of help, and the reason is pedagogical. Stephen Krashen’s Input Hypothesis holds that we acquire language through comprehensible input — language we can understand that sits just beyond our current level, often written as “i+1”. Input that is too hard is noise; input that is too easy teaches nothing new.

Easy-language simplification lowers complexity while keeping the learner immersed in the target language. A learner in a German lecture receives German they can actually follow — shorter sentences, common words, one idea at a time — and keeps building acquisition from real target-language input. An English translation would make the sentence comprehensible too, but it removes the German exposure entirely, which is the thing the learner came for.

Translation answers “what did they say?” Simplification answers “what did they say, in words I can learn from?” Only the second keeps the learner inside the language they are trying to acquire.

The same property carries a parallel accessibility value. Easy-language standards — Leichte Sprache, FALC, Easy English, Lectura Fácil — were developed so that people with cognitive disabilities and low-literacy readers can take part in public and everyday life. A tool that produces compliant easy language serves those readers directly, in their own language.

This is the project’s rationale, grounded in second-language-acquisition theory and established accessibility standards. It is not a clinical claim about learning outcomes.

Origins & acknowledgments

Built for a French–Moroccan virtual exchange

Live Linguist began as the AI component of an AI-supported international virtual exchange between the French program at Kennesaw State University and Université Hassan II Casablanca in Morocco. The idea: let novice second-semester French students hold real, real-time video conversations with more advanced Moroccan peers — without the language gap shutting the conversation down.

The pedagogical brief is exactly the rationale above. Alongside a verbatim transcript, the tool produces a second, simplified transcript that brings higher-proficiency spoken French — up to CEFR C2 — down to a beginner-appropriate CEFR A1–B2 level, in French. That is comprehensible input kept in the target language, not a translation into English — the same principle Live Linguist now applies across all four v1 languages.

The exchange and its proposal — “Language Education and Intercultural Connections with AI-Supported International Virtual Exchanges,” submitted to the French in Higher Education Grant Program of the Albertine Foundation — were conceived and written by the language faculty named below. Version 1 generalizes that single French use case into a four-language, on-device release.

The people who conceived it

The pedagogical concept, curriculum design, and the cross-institutional exchange were developed by:

Kennesaw State University

Department of World Languages and Cultures

  • Dr. Abigail Alexander — Associate Professor of French; Director of the World Languages Resource Collection. Project coordinator; conceived and has led the KSU–Hassan II virtual exchange since fall 2023.
  • Dr. Noëlle Lively — Senior Lecturer of French and Coordinator of French. Curriculum design and exchange coordination.
  • Dr. Federica Santini — Chair, Department of World Languages and Cultures.
  • Brooke Reed — Program Manager, World Languages Resource Collection. Coordinated student testers.

Université Hassan II Casablanca

Casablanca, Morocco

  • Dr. Meriem Hachimi — Associate Professor of French. Co-coordinates the exchange and Moroccan student participation.
  • Dr. Abdelhadi Samadi — Associate Dean for Research and Cooperation. Partner-institution authorization.

Technical & LLM development. Dylan Goldblatt, Ph.D. — AI Strategist and Applied Researcher, KSU Office of Research — designed and built Live Linguist end to end: the fine-tuned easy-language Qwen3 models, the supervised dataset and deterministic register validators, the on-device macOS application, and the CEFR-level caption adaptation. He maintains the models and infrastructure beyond the grant period.

The models

Two specialist simplifiers, built to run on a Mac

We fine-tuned Qwen3 into dedicated easy-language simplifiers and quantized them to 4-bit so they run on-device on Apple silicon via MLX. Two sizes cover the latency/quality trade-off: a small one for live captioning and a larger one when quality matters more than speed.

Qwen3 0.6B · live

4-bit · ~331 MB · lowest latency

The everyday driver for real-time captions, where each segment must be simplified inside a tight budget.

Qwen3-0.6B-EasyLanguage-4bit →

Qwen3 1.7B · quality

4-bit · ~934 MB · highest quality

The higher-fidelity option: stronger SARI and near-perfect register compliance when a few extra milliseconds are acceptable.

Qwen3-1.7B-EasyLanguage-4bit →

Languages and their standards

The four languages in v1, each targeting its official easy-language register and sentence-length rule.
Language Register Standard Sentence cap
GermanLeichte SpracheNetzwerk Leichte Sprache; DIN SPEC 33429≤ 12 words
FrenchFALCUNAPEI; European Easy-to-Read≤ 15 words
EnglishEasy / Plain EnglishUS federal plain-language≤ 18 words
SpanishLectura FácilUNE 153101:2018 EX; Plena Inclusión≤ 15 words

What the effect looks like

A disfluent spoken German sentence, simplified into Leichte Sprache — short sentences, one idea each, filler removed:

Input · spoken German

“also der Termin wurde leider verschoben weil der Arzt krank war”

Output · Leichte Sprache

“Der Termin wurde verschoben. Der Arzt war krank.”

Results

Specialist models beat vanilla Qwen3

We evaluated on a held-out test set of 200 sentences per language, comparing each fine-tuned model against the stock Qwen3 4-bit model of the same size. The fine-tuned models win on every language and every metric. More importantly, they are reliable: the stock model often parrots a prompt example or ignores the easy-language register entirely.

SARI (higher is better)
The standard simplification-quality metric (Xu et al., 2016). It rewards words the model correctly keeps, adds, and deletes relative to references — a single score for “how good is this simplification?”
Register compliance (higher is better)
The share of outputs that actually follow the target register’s rules (sentence length, simple grammar, and the other deterministic checks for that language).
Few-shot parroting (lower is better)
The share of outputs that just copy one of the examples from the prompt instead of simplifying the actual input — a failure mode small instruction-tuned models fall into.
Grouped bar chart of SARI scores, fine-tuned versus stock Qwen3, for both model sizes across German, French, Spanish, and English. The fine-tuned model is higher in every case. For Qwen3-0.6B the fine-tuned scores are 47.9, 58.7, 59.9, and 59.7 versus stock scores of 33.0, 33.3, 36.6, and 32.5. For Qwen3-1.7B the fine-tuned scores are 50.7, 60.1, 61.2, and 62.6 versus stock scores of 41.9, 54.1, 56.9, and 52.0.
Simplification quality (SARI). The fine-tuned models score higher than stock Qwen3 in all eight comparisons; the gap is widest on the small 0.6B model.
Grouped bar chart of register-compliance percentages, fine-tuned versus stock Qwen3. The fine-tuned model follows the easy-language rules in roughly 98 to 100 percent of outputs across all languages and both sizes, while the stock model complies far less often — about 26, 28, 31, and 17 percent for Qwen3-0.6B, and about 58, 56, 61, and 67 percent for Qwen3-1.7B.
Register compliance. Fine-tuned models follow the easy-language rules almost always (≈98–100%); the stock model follows them a minority of the time, especially at 0.6B.
Grouped bar chart of few-shot parroting, where lower is better. The fine-tuned model copies a prompt example in zero percent of outputs everywhere. The stock Qwen3-0.6B copies an example in about 63 percent of German outputs, 49 percent French, 39 percent Spanish, and 46 percent English; stock Qwen3-1.7B copies in about 18 percent of German outputs and near zero elsewhere.
Few-shot parroting (lower is better). The fine-tuned models never copy a prompt example; the stock 0.6B model does so up to 63% of the time, which is why its SARI and compliance collapse.

Results in numbers

Held-out test, 200 sentences per language. Fine-tuned (ft) versus stock Qwen3 4-bit. In-language rate is 1.00 for all fine-tuned models. Stock-parrot column shows the 0.6B stock model.
Lang 0.6B SARI
ft / stock
1.7B SARI
ft / stock
ft compliance stock compliance ft parrot stock parrot (0.6B)
German 47.9 / 33.0 50.7 / 41.9 0.985–0.995 0.27–0.59 0.00 0.63
French 58.7 / 33.3 60.1 / 54.2 0.98–0.995 0.28–0.56 0.00 0.49
Spanish 59.9 / 36.6 61.2 / 56.9 0.985–0.99 0.31–0.61 0.00 0.39
English 59.7 / 32.5 62.6 / 52.0 0.995 0.17–0.67 0.00 0.46

SARI = Xu et al. (2016). Compliance ranges span the two model sizes. “Parrot” is the share of outputs copying a prompt example. Every fine-tuned model stayed in the target language on 100% of test sentences.

The dataset

How the training data was built

The models were fine-tuned on a supervised dataset of sentence-to-easy-language pairs. It combines two sources:

  • Grounding in real simplifications. German pairs are grounded in German4All, a corpus of German Wikipedia text aligned to multiple simplification levels.
  • Synthetic spoken-style data. To cover the live-captioning use case — disfluent, conversational input — we generated additional pairs in a spoken style with a teacher model, then quality-controlled them.

Every pair passed the same deterministic validators the application uses at runtime: sentence-length caps, simple-grammar checks, and a wrong-language guard. Pairs that failed the register rules were rejected, so the training signal reflects the standard rather than just “shorter text.”

live-linguist-easylanguage-sft on Hugging Face →

Run it locally

Everything stays on your Mac

Download Live Linguist for macOS Apple silicon · v1.0.19 · 14 MB .dmg · direct download

Live Linguist is a macOS application that runs the whole pipeline — audio capture, speech recognition, and simplification — on-device on Apple silicon. Nothing is sent to a server. The models above are the simplification stage.

Prefer to build from source, or want the latest unreleased changes? At a high level:

  1. Clone the application repository and follow its README build instructions (macOS 14.4+, Xcode 16+).
  2. Download the easy-language models from Hugging Face (the live 0.6B for low latency, or the quality 1.7B).
  3. Pick a language and an audio source, and read the simplified captions alongside the verbatim transcript.

Full, current setup steps live in the repository’s README:

github.com/ngoldbla/live-linguist →

Provenance, licenses & limitations

What this is built on — and what it is not

Provenance & licenses

  • Base model: Qwen3 (0.6B and 1.7B), by Alibaba — Apache-2.0.
  • Source data: German4All — MIT; derived from German Wikipedia, which is CC BY-SA.
  • Synthetic teacher data: generated with Claude (Anthropic). Review the relevant model and data licenses before redistribution.

Limitations — please read

  • Four languages only. v1 covers German, French, English, and Spanish. Nothing else is supported yet.
  • Small models. These are 0.6B and 1.7B parameter models. They make mistakes, and the smaller one makes more of them.
  • Mostly-synthetic data. Much of the training data is model-generated. It can carry the teacher model’s biases and errors despite validator QC.
  • Not for high-stakes use. Do not rely on these outputs for medical, legal, financial, or safety-critical communication. They are an aid, not an authority.
  • Easy language is an approximation. Compliance with a register’s rules is measured automatically; it does not guarantee a certified human-reviewed easy-language text.

Roadmap

v1 ships German, French, English, and Spanish. Additional languages are planned for future versions along the same model line; the architecture and easy-language-register approach stay the same.