Script import
Paste or type a script. The import screen ships with a collapsible format guide and a built-in sample script for first-time users.
Start with text. End with audio. Nothing in between is your problem.
All shipped, all in the current pre-release build, all grouped by what they help you do — script handling, voices, generation, post-production, export. If it's listed here, you can use it today.
Script
How text gets into the app. Paste any of four common script formats, let the parser detect speakers and audio cues, and review only the lines it wasn't sure about.
Paste or type a script. The import screen ships with a collapsible format guide and a built-in sample script for first-time users.
Start with text. End with audio. Nothing in between is your problem.
The parser detects speakers, SFX, and music cues across colon (`HOST:`), bracket (`[Host]`), parenthesis (`(Host)`), and standalone-name formats, and assigns confidence scores to each attribution.
It reads the format you already write in, not the other way around.
Every detected turn shows its speaker with a green / orange / red confidence indicator. Lines that need attention are highlighted; the rest you scroll past.
Review only the lines the parser wasn't sure about.
Add new speakers, merge duplicates, batch-assign unreviewed turns, and give each character a unique color badge for visual distinction.
Two slightly different spellings of `HOST` collapse into one in a single tap.
Edit the raw script after import. An audio-tag reference panel makes inserting SFX and music cues quick. Re-parsing replaces existing speaker assignments and generated audio (with a confirmation warning).
Tweak the dialogue without leaving the project.
Voices
How characters get voices. Browse the ElevenLabs catalog, audition voices in line, map one to each speaker, and dial in stability, similarity, style, and speed per character.
Browse the ElevenLabs catalog with search, category and type filters, and paginated results. Inline play / stop preview on every row.
Audition voices the way you'd audition actors — by listening.
Map each speaker in your script to a TTS voice. The mapping screen shows assignment status for every speaker with a clear progress indicator.
You always know which character still needs a voice.
Independent sliders for stability, similarity boost, style exaggeration, and speed, plus a speaker boost toggle. V3 models show a simplified panel.
Two characters using the same voice can still sound like different people.
Per-voice metadata, verified languages, category, and a `Find Similar` feature that analyzes the voice's preview audio and returns related options from the catalog.
Almost the right voice? Find the cousin.
Per-project rules for tricky names and technical terms. Alias replacements or phoneme overrides in IPA or CMU Arpabet. Synced to ElevenLabs before generation.
Your guest's name pronounced the same way every time.
Generation
How the script becomes audio. Speech, sound effects, and music all generate in one run — with progress tracking, resume-on-failure, and an offline demo mode for the whole pipeline.
Each speech turn is generated with the assigned voice profile and contextual parameters (previous / next text, request-ID history) for voice continuity. SFX and music turns hit dedicated endpoints.
The voice doesn't forget who it just was.
Batch multiple speech turns into a single multi-speaker call for natural conversational flow. The app respects a 2,000-character and 10-unique-voice limit per batch and splits larger scripts automatically.
Conversations sound like conversations, not stitched-together monologue.
Write `[SFX: door slam (2s)]` directly in your script. flexCast generates the effect with a duration clamped between 0.5 and 30 seconds.
No separate sound library. No drag-and-drop timeline.
Write `[Music: upbeat jazz intro (10s)]` and the app generates it. Duration clamps between 3 seconds and 10 minutes; an instrumental-only flag is available.
An intro cue without a stock-music tab.
An animated circular progress ring shows percentage, current turn, estimated time remaining, and elapsed time. Cancel at any point — already-generated audio is preserved.
You can see how far along it is. You can also walk away.
If generation is interrupted or cancelled, the app detects which turns already have audio and offers a one-tap resume that skips completed turns.
A dropped connection costs you a tap, not the whole run.
Every generated file is checked for corruption, silence, and minimum duration. Files with detected issues are flagged with a warning badge in post-production.
The app notices the broken take before you do.
With no API key configured, a mock TTS service returns silent WAV audio with realistic durations. Every screen — import, review, voices, generate, edit, mix, export — works end to end.
Learn the entire workflow before spending a dollar.
Post-production
How a single bad take stops being a problem. Regenerate one line at a time, compare variants side by side, dial in pauses, exclude segments, and edit dialogue inline.
Play each segment individually. Simple and advanced modes are available — advanced shows per-turn controls, variant counts, and quality indicators.
Find the line that's off without scrubbing a 30-minute mix.
Swipe a turn or use its context menu to regenerate it. The new take is saved as an additional variant — your previous take is never overwritten.
Fix one line. Leave everything else exactly where it was.
Each turn can have multiple takes. Browse them, play them back to back, mark one as active, and delete the rest. `Keep Only Active` cleans up across the whole project.
Pick the read you want. The rest go quietly.
Two levels of control: a project-wide default, and per-turn custom pauses dialed in with a slider from 0 to 2 seconds.
Comic timing without a waveform editor.
Exclude individual turns from the final mix without deleting them. Excluded turns appear with strikethrough text and are skipped during mixing — toggle them back any time.
Cut a beat without losing the option to put it back.
Edit a turn's dialogue text directly from post-production. After saving, optionally regenerate the audio immediately to match.
A typo in the script doesn't mean a trip back to import.
Playback & export
How the episode leaves the app. Mix every active turn into one M4A, scrub the waveform, preview the cast before a full run, and share via the iOS share sheet.
Concatenate every active, non-excluded turn into a single M4A / AAC file with configurable pauses between segments.
One file. Ready to share.
Listen to the mixed output with play / pause, skip forward and back 15 seconds, and a waveform scrubber for precise seeking.
Hear the finished episode the same way your listeners will.
Stream the first ~2,000 characters of dialogue with assigned voices via the ElevenLabs dialogue API. A segment timeline shows colored bars per speaker.
Catch a miscast voice before you spend a full generation run.
Share the final M4A via the iOS share sheet — Files, AirDrop, email, or any app that accepts audio.
Send it to your editor. Send it to your hosting platform. Send it to yourself.
Project & settings
How multiple projects coexist. Create, rename, duplicate, and delete; manage your ElevenLabs key in the iOS Keychain; pick the default TTS model; load a sample to onboard yourself.
Create, rename, duplicate, and delete projects. The list is searchable and sorted by most recently updated.
Three episodes in flight, one app, no folder chaos.
Script. Voices. Generate. Production. A progress bar tracks where you are; tabs unlock as the project advances; the current step is auto-selected based on status.
You always know what to do next. No audio engineering required.
Deep-copy a project including script, turns, speakers, voice profiles, and pronunciation rules. Audio is intentionally not copied — the duplicate is a fresh starting point.
Spin up a new variant of an episode without rebuilding the cast.
A pre-populated sample script is one tap away from the empty-state screen.
Onboard yourself in a minute, not a manual.
Enter and store an ElevenLabs API key in the iOS Keychain. Remove it any time. The app clearly displays whether it's running in API or demo mode.
Your key sits in the same vault as your bank passwords.
Test the saved key by fetching available voices. Pick the default TTS model from Multilingual v2, Turbo v2.5, Turbo v2, English v1, or v3.
You know the key works before you start a 40-minute run.
Interface
The parts you feel before you read them. Haptics that confirm what just happened, toast notifications that don't shout, and a three-tier database recovery so a bad row never takes the app down.
Light taps for selections, medium impacts for state changes, ticks for slider adjustments, notification haptics for generation milestones and errors.
The phone confirms what just happened without asking your eyes.
Ephemeral feedback at the top of the screen for assignments, duplications, regeneration completion, and errors. Four styles, matching icons, VoiceOver-accessible.
Confirmation when you need it. Silence when you don't.
If the local SwiftData store is corrupted on launch, the app attempts recovery in three tiers: normal open, delete-and-retry, and in-memory fallback. Users are notified of any data reset.
A bad row doesn't take the app down with it.
What we don't do
The walkthrough takes you screen by screen — from pasting a script to exporting an M4A — without leaving this site.