retro formant speech / local synth / MP3 / MP4

Low Quality Robot Text to Speech

Type a script, press play, and the browser breaks the words into pronounceable sounds, feeds them to a local formant synthesizer, and renders a rough late-70s computer voice.

Plaster robot mask with cobalt sound waves — no cloud formant engine local MP3

instrument / 01

Script chamber

Paste a line, a poem, or a script. The page turns words into phoneme groups, renders them with a local formant engine, and can export either audio or a moving soundmap.

Text

Ready.

Words become phonemes

The script is split into pronounceable sound groups before it reaches the synthesizer.

Browser-only render

The formant synth runs locally. Your text does not need a server round trip to become audio.

Retro by design

It is not polished neural speech. It is a brittle, direct, low-quality robot voice for character and texture.

MP3 or MP4 export

Download the dry robot voice as audio, or render a cobalt soundmap video in the same visual system as this page.

Embossed waveform relief with cobalt phoneme blocks

score / 07

Four movements

01
Write or paste the source text.
02
Choose a preset and tune speed, pitch, melody, and volume.
03
Play, stop, or restart until the rough machine cadence feels right.
04
Save the rendered MP3 without leaving the page.

score / 08

Technical score

Input

Plain text in the textarea, then hidden phoneme grouping before synthesis.

Engine

Klattsch-style formant synthesis, tuned through local voice presets.

Privacy

Text and audio rendering happen inside the browser runtime.

Output

Immediate playback plus downloadable MP3 audio or MP4 soundmap video.

archive / 09

Questions from the machine room

Is this a natural-sounding TTS model?

No. The charm is the rough robotic formant sound. It is useful when you want texture, not a polished assistant voice.

Does the page upload my text?

No. The speech conversion, playback, and MP3 render happen locally in the browser.

What happens before synthesis?

The page converts visible words into phoneme groups, applies the selected voice contour, then schedules those targets through the formant engine.

What should I export?

Use MP3 when you only need the voice. Use MP4 when the soundmap itself should become the visual artifact.