Record takes in-browser or upload audio files. ElevenLabs prefers multiple longer takes; Inworld and Cartesia work best with one 5–15 second clip.
Read the text in the image aloud (~10–15 seconds) and submit the recording via POST /v1/voices/pvc/{voice_id}/captcha to verify ownership and unlock training.
POST /v1/voices/pvc/{voice_id}/captcha
Captured from your conversation with the assistant. The doctor will see this before your visit.