Six steps · iPhone & iPad · iOS 26.4

From a paste of text
to a finished episode.

The visible workflow is Script, Production, Export. Under that simple progress bar, flexVox handles review, casting, generation, post-production, follow-along playback, export presets, and transcript output.

01
Step 1
Paste your script

On the Script Import screen, paste any dialogue script. Expand the format guide if you want a quick reference. Tap `Review Script` and the parser runs.

  • Recognized formats: `HOST: line`, `[Host] line`, `(Host) line`, standalone name lines

  • Audio tags: `[SFX: prompt (Ns)]`, `[Music: prompt (Ns)]`; structure tags: `[SCENE:]`, `[CHAPTER:]`, `[ACT:]`, `[PART:]`

  • Templates for Interview, Audio Drama, True Crime, Newscast, and Narration

02
Step 2
Review speakers

Turns needing review are highlighted with confidence indicators. Tap one to confirm or reassign. Batch-assign unreviewed turns or merge duplicate speakers from the toolbar.

  • Green / orange / red confidence per attribution

  • Collapsible scene sections and manual scene creation

  • Suggestions for near-matches (`Did you mean 'HOST'?`)

  • Insert or delete turns from review

03
Step 3
Assign voices

Tap Auto-Cast All for AI-assisted casting, or open a casting session to search, preview, and assign voices speaker by speaker. Expand per-speaker settings to dial in stability, similarity, style, and speed. Quick Preview streams a sample before a full run.

  • Search and category filters across the ElevenLabs catalog

  • AI casting suggestions and Auto-Cast for unvoiced speakers

  • Inline preview on every voice row

  • `Find Similar` from any voice's detail view

  • Optional pronunciation rules per project

04
Step 4
Generate audio

A progress ring shows percentage, current turn, and estimated time remaining. Cancel any time without losing what's already generated. If something fails, `Resume` picks up from where it stopped.

  • Speech, SFX, and music all generate in one run

  • Optional background music is generated in a second pass to match dialogue duration

  • Dialogue mode batches multi-speaker turns for natural flow

  • Automatic quality checks flag corrupt or silent files

05
Step 5
Edit in post-production

Play each turn individually. Swipe to regenerate the ones that need work, compare variants side by side, adjust per-turn pauses, and exclude any segments that shouldn't ship.

  • Single-turn regeneration without touching the rest

  • Variant picker with `Use` and `Keep Only Active`

  • Per-turn pause sliders, 0 to 10 seconds

  • Underlay mode and auto-ducking for music and SFX beds

  • Edit dialogue text and regenerate inline

06
Step 6
Mix & export

All active, non-excluded turns mix into a single M4A. Listen with follow-along script highlighting, then export using platform presets and share audio, transcript, or caption files via the iOS share sheet.

  • M4A / AAC output with podcast, platform, broadcast, and custom loudness targets

  • Word-level highlighting when forced-alignment data exists

  • Transcript and caption export: SRT, VTT, JSON, plain text

If something fails

A dropped connection costs you a tap, not the run.

If generation is interrupted or cancelled, the app detects which turns already have audio and offers a one-tap resume that skips completed turns. Your project stays intact.

If you change your mind

Re-parsing is destructive. The app says so first.

Editing the raw script and re-parsing replaces existing speaker assignments and deletes any generated audio. The app shows you a confirmation before it does. Everything else — variants, pauses, exclusions — is reversible.

Try the workflow first

Demo mode runs every screen, end to end, with no API key.

When no ElevenLabs key is configured, flexVox uses a mock TTS service that returns silent WAV audio with realistic durations. You can paste a script, parse it, assign voices, generate, edit, mix, and export — exactly like you would in the real app. The only thing missing is audible voices.

Ask about the beta
  • Step you can't skip

    Demo mode lets you walk the workflow end to end before connecting an ElevenLabs account.

  • Same persistence

    Projects, speakers, scenes, voice mappings, and pause settings save in SwiftData the same way they would with real audio.

  • Same export path

    A demo mix exports through the same path as real audio, so the pipeline you test is the pipeline you use.

Want the rest of the answers?

The FAQ covers script formats, voice limits, SFX and music duration, shows and series, AI writing, the Keychain story, and what flexVox does and doesn't collect.