Script import
Paste or type a script. The import screen ships with a collapsible format guide and a built-in sample script for first-time users.
Start with text. End with audio. Nothing in between is your problem.
All documented in the current pre-release build, all grouped by what they help you do — script handling, voices, generation, post-production, export, shows, AI writing, settings, and Studio.
Script
How text gets into the app. Paste any of four common script formats, let the parser detect speakers and audio cues, and review only the lines it wasn't sure about.
Paste or type a script. The import screen ships with a collapsible format guide and a built-in sample script for first-time users.
Start with text. End with audio. Nothing in between is your problem.
The parser detects speakers, SFX, music cues, scene markers, and chapter markers across colon (`HOST:`), bracket (`[Host]`), parenthesis (`(Host)`), standalone-name, `[SCENE:]`, `[CHAPTER:]`, `[ACT:]`, and `[PART:]` formats.
It reads the format you already write in, not the other way around.
Every detected turn shows its speaker with a green / orange / red confidence indicator. Lines that need attention are highlighted; the rest you scroll past.
Review only the lines the parser wasn't sure about.
Add new speakers, merge duplicates, batch-assign unreviewed turns, and give each character a unique color badge for visual distinction.
Two slightly different spellings of `HOST` collapse into one in a single tap.
Use `[SCENE: Title]`, `[CHAPTER: Title]`, `[ACT: Title]`, or `[PART: Title]` tags to structure a script. Scenes appear as collapsible sections, can be added manually, and turns can move between scenes.
Your audio drama can keep its shape instead of becoming one long list.
Insert dialogue, SFX, or music turns at any point from review. Assign the new turn to a speaker and scene, or delete turns from the context menu.
A missing beat does not send you back to a blank import screen.
Edit the raw script after import. An audio-tag reference panel makes inserting SFX and music cues quick. Re-parsing replaces existing speaker assignments and generated audio (with a confirmation warning).
Tweak the dialogue without leaving the project.
Voices
How characters get voices. Browse the ElevenLabs catalog, audition voices in line, map one to each speaker, and dial in stability, similarity, style, and speed per character.
Browse the ElevenLabs catalog with search, category and type filters, and paginated results. Inline play / stop preview on every row.
Audition voices the way you'd audition actors — by listening.
Map each speaker to a TTS voice. `Cast All Speakers` opens a casting session with speaker progress, visible assignments, auto-advance, and inline AI casting suggestions.
You can cast a whole script without losing your place.
Assign voices to every unvoiced speaker in one tap. flexVox analyzes each speaker's role, generates a search query from an AI archetype suggestion, and selects the best-match voice while preserving manual assignments.
Start with a plausible cast, then change only what needs changing.
Independent sliders for stability, similarity boost, style exaggeration, and speed, plus a speaker boost toggle. V3 models show a simplified panel.
Two characters using the same voice can still sound like different people.
Per-voice metadata, verified languages, category, and a `Find Similar` feature that analyzes the voice's preview audio and returns related options from the catalog.
Almost the right voice? Find the cousin.
Per-project rules for tricky names and technical terms. Alias replacements or phoneme overrides in IPA or CMU Arpabet. Synced to ElevenLabs before generation.
Your guest's name pronounced the same way every time.
Generation
How the script becomes audio. Speech, sound effects, and music all generate in one run — with progress tracking, resume-on-failure, and an offline demo mode for the whole pipeline.
Each speech turn is generated with the assigned voice profile and contextual parameters (previous / next text, request-ID history) for voice continuity. SFX and music turns hit dedicated endpoints.
The voice doesn't forget who it just was.
Batch multiple speech turns into a single multi-speaker call for natural conversational flow. The app respects a 2,000-character and 10-unique-voice limit per batch and splits larger scripts automatically.
Conversations sound like conversations, not stitched-together monologue.
Write `[SFX: door slam (2s)]` directly in your script. flexVox generates the effect with a duration clamped between 0.5 and 30 seconds, with optional seamless-loop requests for ambient beds.
No separate sound library just to make the door slam happen.
Write `[Music: upbeat jazz intro (10s)]` and the app generates it. Duration clamps between 3 seconds and 10 minutes; an instrumental-only flag is available.
An intro cue without a stock-music tab.
Add a background music layer that spans the whole episode. Describe the mood and style; flexVox generates a track sized to the total dialogue duration and gives the project its own volume control.
A soundtrack that fits the episode instead of the other way around.
An animated circular progress ring shows percentage, current turn, estimated time remaining, and elapsed time. Cancel at any point — already-generated audio is preserved.
You can see how far along it is. You can also walk away.
If generation is interrupted or cancelled, the app detects which turns already have audio and offers a one-tap resume that skips completed turns.
A dropped connection costs you a tap, not the whole run.
Every generated file is checked for corruption, silence, and minimum duration. Files with detected issues are flagged with a warning badge in post-production.
The app notices the broken take before you do.
With no API key configured, a mock TTS service returns silent WAV audio with realistic durations. Every screen — import, review, voices, generate, edit, mix, export — works end to end.
Learn the entire workflow before spending a dollar.
Post-production
How a single bad take stops being a problem. Regenerate one line at a time, compare variants side by side, dial in pauses, exclude segments, and edit dialogue inline.
Play each segment individually. Simple and advanced modes are available, with an interactive mini timeline, current playback time, and `Play from Here` for continuous sequential playback.
Find the line that's off without scrubbing a 30-minute mix.
Swipe a turn or use its context menu to regenerate it. The new take is saved as an additional variant — your previous take is never overwritten.
Fix one line. Leave everything else exactly where it was.
Each turn can have multiple takes. Browse them, play them back to back, mark one as active, and delete the rest. `Keep Only Active` cleans up across the whole project.
Pick the read you want. The rest go quietly.
Two levels of control: a project-wide default and per-turn custom pauses dialed in with sliders from 0 to 10 seconds.
Comic timing without a waveform editor.
Exclude individual turns from the final mix without deleting them. Excluded turns appear with strikethrough text and are skipped during mixing — toggle them back any time.
Cut a beat without losing the option to put it back.
Edit a turn's dialogue text directly from post-production. After saving, optionally regenerate the audio immediately to match.
A typo in the script doesn't mean a trip back to import.
Mark music or SFX turns to play underneath dialogue instead of sequentially. Underlay turns live on a dedicated audio track with per-turn volume control.
Rain, crowds, and engine hum can stay present while people talk.
Underlay and background music automatically lower when dialogue plays, then rise during pauses. Duck depth, attack, and release controls shape the curve.
Music gets out of the way without you drawing automation.
Playback & export
How the episode leaves the app. Mix the finished audio, follow along with the script, export for specific platforms, and include transcript or caption files.
Mix dialogue, crossfade overlay, ducked underlay, and background music into a single M4A / AAC file with configurable pauses and peak normalization.
One file. Ready to share.
The playback screen shows the script with timestamps, speaker badges, active-turn highlighting, auto-scroll, tap-to-seek, and word-level highlighting when alignment data is available.
Find a mispronunciation by reading along, not scrubbing blindly.
When a project has completed audio, Script and Production tabs show a compact bar with project name, play / pause, progress, and current time. Tapping it opens Export.
The mix stays one tap away while you keep editing.
Stream the first ~2,000 characters of dialogue with assigned voices via the ElevenLabs dialogue API. A segment timeline shows colored bars per speaker.
Catch a miscast voice before you spend a full generation run.
Export from the playback screen or with Cmd+E. Choose Spotify, Apple Podcasts, YouTube, Broadcast, or Custom loudness presets, then share audio plus SRT, VTT, JSON, or plain-text transcripts.
One export sheet for the file and the words that go with it.
Project & settings
How multiple projects coexist. Create, rename, duplicate, template, and search projects while the Script / Production / Export workflow shows what is ready.
Create, rename, duplicate, and delete projects. The list is searchable and sorted by most recently updated.
Three episodes in flight, one app, no folder chaos.
Script. Production. Export. A three-step progress bar shows readiness while keeping every tab accessible. Generation progress follows you with a tap-to-view banner.
Guidance without locking you out of the work.
Deep-copy a project including script, turns, speakers, voice profiles, and pronunciation rules. Audio is intentionally not copied — the duplicate is a fresh starting point.
Spin up a new variant of an episode without rebuilding the cast.
A pre-populated sample script is one tap away from the empty-state screen.
Onboard yourself in a minute, not a manual.
Start from Interview, Audio Drama, True Crime, Newscast, or Narration templates. The first-launch empty state presents template cards before a blank page.
The first project can start with a format, not a cursor.
Enter and store an ElevenLabs API key in the iOS Keychain. Remove it any time. The app clearly displays whether it's running in API or demo mode.
Your key sits in the same vault as your bank passwords.
Test the saved key by fetching available voices. Pick the default TTS model from Multilingual v2, Turbo v2.5, Turbo v2, English v1, or v3.
You know the key works before you start a 40-minute run.
Shows & series
How ongoing productions keep their identity. Define a cast bible, show style, audio identity, and episode structure once, then reuse them across episodes.
Create ongoing productions with persistent cast, format, tone, narrator mode, episode numbering, and reusable audio identity.
Define the show once. Start each episode with the bones already in place.
Recurring and guest cast members can carry role, age, biography, personality, speaking style, and optional voice assignment into every new episode.
Characters stay consistent because their notes travel with them.
Define reusable show segments such as Intro, Main Topic, Listener Q&A, and Outro. Segments are reorderable and can include descriptions.
A recurring show can keep its rhythm without copy-paste setup.
Set a production's default format, tone, narrator mode, intro music, outro music, and transition sounds.
A show can sound like itself before the next script is pasted.
Convert any standalone project into a show from the project list. Speakers, voice assignments, and character profiles become the production cast.
When one episode becomes a series, the app moves with you.
Create new episodes from a production with the show's cast, voice assignments, character profiles, narrator mode, and season numbering already applied.
Episode three starts where episode two left the setup.
AI script writing
How a premise becomes production text. Generate scripts with OpenAI or Claude, use character profiles and reference material, or copy the generated prompt to another tool.
Generate scripts in-app with OpenAI or Claude. Configure format, tone, speaker count, scenes, chapters, expression tags, SFX, music, and provider before generation.
Go from premise to production-ready script without leaving the project.
Speaker profiles feed role, age, biography, personality, and speaking style into AI script generation for more distinct characters.
The generated dialogue has more to work from than a name.
Upload text reference materials such as research, outlines, and articles. flexVox sends them as context without asking the model to copy them verbatim.
Give the writer context, not a blank prompt.
Choose Full Cast, Single Narrator, or Narrator + Cast. The mode affects both AI script generation and voice mapping.
A documentary, monologue, and drama do not need the same cast logic.
Sound library
How reusable audio assets stay reusable. Import, tag, search, and assign sound effects or music without losing the original library item.
Import reusable M4A, MP3, WAV, and AIFF sound effects or music into a global library that persists across projects.
Your best sounds become assets, not one-off imports.
Tag sounds with comma-separated keywords, search by name or tag, and filter by category.
Find the rain bed before the scene dries out.
Assign library sounds to SFX and music turns in post-production. The file is copied into the project's audio assets while the library source remains reusable.
Reuse the cue without tying projects together.
Help & guidance
How the app teaches without taking over. Searchable help lives in Settings, while contextual tips appear where they are useful and stay dismissed once closed.
A searchable guide in Settings covers scripts, voices, generation, post-production, shows and series, and advanced topics.
The manual lives where the questions happen.
Dismissible tips appear at key workflow points, then stay dismissed once closed.
Guidance shows up once, then gets out of the way.
Settings
How service credentials and defaults are managed. Store API keys in Keychain, test every provider, choose TTS defaults, and manage subscription state from a split-pane settings sheet.
Settings uses a spacious NavigationSplitView with categories for Subscription, API Keys, Text-to-Speech, AI Writing, Display, Data & Sync, and About.
Serious controls do not have to feel buried.
ElevenLabs, OpenAI, and Claude keys live in one secure section with save, test, status, and remove controls for each provider.
Every credential has one obvious home.
Test ElevenLabs by fetching available voices; test OpenAI and Claude with a round-trip API call. Status appears inline.
You know a key works before your episode depends on it.
Choose the default ElevenLabs model and text normalization settings from a dedicated Text-to-Speech section.
Voice defaults are separate from key management.
Interface
The parts you feel before you read them. Haptics, toast feedback, keyboard shortcuts, command palette, empty states, skeleton loading, animation, and database recovery.
Light taps for selections, medium impacts for state changes, ticks for slider adjustments, notification haptics for generation milestones and errors.
The phone confirms what just happened without asking your eyes.
Ephemeral feedback at the top of the screen for assignments, duplications, regeneration completion, and errors. Four styles, matching icons, VoiceOver-accessible.
Confirmation when you need it. Silence when you don't.
If the local SwiftData store is corrupted on launch, the app attempts recovery in three tiers: normal open, delete-and-retry, and in-memory fallback. Users are notified of any data reset.
A bad row doesn't take the app down with it.
Press Cmd+K to search actions, projects, and shows. Results are grouped and keyboard navigable.
Power users can jump instead of tapping around.
Use Cmd+1/2/3 for workflow tabs, Cmd+G for Production, Cmd+E for Export, Space for playback, arrows for skipping, Cmd+R for regeneration, and M to mute.
The iPad keyboard gets treated like a first-class input device.
No projects, no shows, no sounds, failed searches, and unready tabs share a unified branded empty-state component.
Blank spaces explain themselves without turning into clutter.
List views use shimmer rows while content is being prepared.
Waiting states feel intentional, not broken.
List rows animate in with a staggered fade-and-slide entrance across the app.
Motion gives the interface a little clarity without slowing it down.
Studio
How the free app grows. The full workflow starts free; Studio unlocks scale, show management, AI writing, advanced mixing, export presets, and unlimited projects.
Up to three projects, the complete script workflow, voice browsing and assignment, per-turn generation with your own API key, post-production basics, follow-along playback, Podcast export, and demo mode.
The main workflow is real before anyone pays.
Studio unlocks unlimited projects, Shows and Series, Auto-Cast, dialogue generation mode, background music, underlay and auto-ducking, export presets, AI writing, Sound Library, pronunciation dictionary, templates, and pacing reports.
The upgrade is for scale and polish, not for making the app usable.
Settings shows subscription status, purchase, restore, and transaction updates through StoreKit 2.
The subscription state is visible and recoverable.
Studio-only features display a small badge and open the upgrade sheet with the relevant feature highlighted.
Locked features explain what they are before asking for money.
What we don't do
The walkthrough takes you screen by screen — from pasting a script to exporting an M4A — without leaving this site.
Also from the studio
A native Mac media wall with a mean little grin.
Open flexGrid →
Fast workout logging for solo lifters.
Open flexRep →
Audio levels on your screen edge.
Open flexMeter →
Every format in. Clean Markdown out.
Open flexDoc →
Pull up a chair. Bluesky, at reading speed.
Open Lanai →
Messy spreadsheet in. Print-ready dashboard out.
Open flexStats →