ElevenLabs
Overview
Section titled “Overview”ElevenLabs is an AI voice synthesis platform that generates natural-sounding voices in any language with extremely high fidelity. With the integration in SquadOS, your agents can convert text to audio, clone custom voices, dub videos and audio files, and even create and manage full conversational AI agents with phone call support.
- Official website: https://elevenlabs.io/
- Composio documentation: docs.composio.dev/toolkits/elevenlabs
Authentication
Section titled “Authentication”This tool uses API key (API_KEY) to connect.
You will need the following fields:
| Field | Required | Description |
|---|---|---|
api_key | Yes | Your ElevenLabs account API key, used to authenticate all requests to the API. |
How to get credentials
Section titled “How to get credentials”- Go to elevenlabs.io and log in or create an account.
- Click your avatar in the bottom-left corner and go to Profile.
- Scroll to the API key section and click Copy to copy the key.
- Use this value in the
api_keyfield when connecting in SquadOS.
How to connect in SquadOS
Section titled “How to connect in SquadOS”- Go to Tools in the side menu (
/admin/tools). - Open the Available tab and search for
ElevenLabs. - Click the card to open the details modal and hit Connect.
- You’re taken to the secure connection page hosted by Composio, where you enter the API key obtained above.
- Once done, you’re sent back to SquadOS with the account connected and the tool available to your agents. (Connection-flow details in Organization Tools.)
Available actions
Section titled “Available actions”Text to speech
Section titled “Text to speech”ELEVENLABS_TEXT_TO_SPEECH
Converts text to speech using a specified ElevenLabs voice and model, returning a downloadable audio file. The audio URL is nested at data.file.s3url in the response. Use ELEVENLABS_TEXT_TO_SPEECH_STREAM for real-time streaming instead.
Input parameters
Section titled “Input parameters”| Name | Type | Required | Description |
|---|---|---|---|
text | string | Yes | Input text for speech conversion. Max 10,000 characters for most models. Flash/Turbo v2 models: up to 30,000. Flash/Turbo v2.5 models: up to 40,000. |
voice_id | string | Yes | Identifier of the voice to use. Obtainable from the /v1/voices endpoint. |
model_id | string | No | Identifier of the synthesis model. List available models via /v1/models; ensure can_do_text_to_speech is true. |
output_format | string | No | Output audio format (e.g., mp3_44100_128, pcm_24000, ulaw_8000). Some formats require a specific subscription tier. |
seed | integer | No | Integer seed for potentially deterministic audio generation. |
voice_settings | object | No | Voice settings for controlling speech generation characteristics. |
optimize_streaming_latency | integer | No | Latency optimization controls (0–4). Higher values reduce latency, potentially affecting quality. |
pronunciation_dictionary_locators | array | No | List of up to 3 pronunciation dictionary locators, applied sequentially. |
Output
Section titled “Output”| Name | Type | Required | Description |
|---|---|---|---|
data | string | Yes | Data from the action execution. |
error | string | No | Error if any occurred during the execution of the action. |
successful | boolean | Yes | Whether or not the action execution was successful. |
Text to speech stream
Section titled “Text to speech stream”ELEVENLABS_TEXT_TO_SPEECH_STREAM
Converts text to a spoken audio stream in real-time without saving a file or creating a history entry. Ideal for real-time responses. Use optimize_streaming_latency to balance latency versus quality.
Input parameters
Section titled “Input parameters”| Name | Type | Required | Description |
|---|---|---|---|
text | string | Yes | The text to be converted into speech. Recommended to keep under 5,000 characters. |
voice_id | string | Yes | Identifier of the voice to use. Retrieve available IDs from GET /v1/voices. |
model_id | string | No | Identifier of the model. Verify can_do_text_to_speech is true for the chosen model. |
output_format | string | No | Output audio format (e.g., mp3_44100_128, pcm_24000). Some formats require Creator or Pro tier. |
seed | integer | No | Seed for deterministic generation. |
optimize_streaming_latency | integer | No | Latency optimization (0–4). Value 4 disables the text normalizer for lowest latency. |
pronunciation_dictionary_locators | array | No | List of up to 3 pronunciation dictionary locators. |
Output
Section titled “Output”| Name | Type | Required | Description |
|---|---|---|---|
data | string | Yes | Data from the action execution. |
error | string | No | Error if any occurred during the execution of the action. |
successful | boolean | Yes | Whether or not the action execution was successful. |
Speech to speech
Section titled “Speech to speech”ELEVENLABS_SPEECH_TO_SPEECH
Converts an input audio file to speech using a specified voice. If a model_id is provided, it must support speech-to-speech conversion.
Input parameters
Section titled “Input parameters”| Name | Type | Required | Description |
|---|---|---|---|
audio | object | Yes | The audio file to be converted. |
voice_id | string | Yes | Identifier of the target voice. |
model_id | string | No | Identifier of the model (must have can_do_voice_conversion equal to true). |
output_format | string | No | Output audio format. |
seed | integer | No | Seed for deterministic audio generation (0–4294967295). |
voice_settings | string | No | JSON string defining voice settings such as stability and similarity_boost. |
optimize_streaming_latency | integer | No | Latency optimization (0–4). |
Output
Section titled “Output”| Name | Type | Required | Description |
|---|---|---|---|
data | string | Yes | Data from the action execution. |
error | string | No | Error if any occurred during the execution of the action. |
successful | boolean | Yes | Whether or not the action execution was successful. |
Speech to speech streaming
Section titled “Speech to speech streaming”ELEVENLABS_SPEECH_TO_SPEECH_STREAMING
Converts an input audio stream to a different voice output stream in real-time, using a specified speech-to-speech model.
Input parameters
Section titled “Input parameters”| Name | Type | Required | Description |
|---|---|---|---|
audio | object | Yes | The input audio file (e.g., .wav, .mp3) to be converted. |
voice_id | string | Yes | Identifier of the voice to use. |
model_id | string | No | Identifier of the speech-to-speech model (e.g., eleven_english_sts_v2). |
output_format | string | No | Desired output audio stream format. |
seed | integer | No | Seed for deterministic audio generation (0–4294967295). |
optimize_streaming_latency | integer | No | Latency optimization (0–4). |
Output
Section titled “Output”| Name | Type | Required | Description |
|---|---|---|---|
data | string | Yes | Data from the action execution. |
error | string | No | Error if any occurred during the execution of the action. |
successful | boolean | Yes | Whether or not the action execution was successful. |
Add a voice
Section titled “Add a voice”ELEVENLABS_ADD_VOICE
Adds a custom voice by uploading audio samples for voice cloning. Recommended: 1–2 minutes of clear audio without background noise. Supported formats: mp3, wav, ogg.
Input parameters
Section titled “Input parameters”| Name | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Name for the new voice, used as its identifier in the platform. |
files | array | Yes | List of audio files for voice cloning. At least one file is required. |
description | string | No | Optional description detailing the voice’s characteristics or intended use cases. |
labels | string | No | Optional stringified JSON object of key-value pairs for categorization (e.g., {"accent": "American"}). |
remove_background_noise | boolean | No | If true, removes background noise from samples. Only use if samples contain noise; applying to clean audio can reduce quality. |
Output
Section titled “Output”| Name | Type | Required | Description |
|---|---|---|---|
data | string | Yes | Data from the action execution. |
error | string | No | Error if any occurred during the execution of the action. |
successful | boolean | Yes | Whether or not the action execution was successful. |
Edit voice
Section titled “Edit voice”ELEVENLABS_EDIT_VOICE
Updates the name, audio files, description, or labels for an existing voice model. Only voices you own (cloned voices) can be edited; premade/default voices cannot be edited. The name field is required for all edit operations.
Input parameters
Section titled “Input parameters”| Name | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Name for the voice model. This field is required. |
voice_id | string | Yes | Identifier of the voice to edit. Only voices owned by you can be edited. |
files | array | No | Optional list of audio files to add to the voice model. Formats: mp3, wav, ogg. |
description | string | No | New description for the voice model. |
labels | string | No | JSON string of key-value pairs for categorization; new labels overwrite existing ones. |
remove_background_noise | boolean | No | If true, removes background noise from samples. |
Output
Section titled “Output”| Name | Type | Required | Description |
|---|---|---|---|
data | string | Yes | Data from the action execution. |
error | string | No | Error if any occurred during the execution of the action. |
successful | boolean | Yes | Whether or not the action execution was successful. |
Edit voice settings
Section titled “Edit voice settings”ELEVENLABS_EDIT_VOICE_SETTINGS
Edits key voice settings (e.g., stability, similarity enhancement, style exaggeration, speaker boost) for an existing voice, affecting all future audio generated with that voice ID.
Input parameters
Section titled “Input parameters”| Name | Type | Required | Description |
|---|---|---|---|
voice_id | string | Yes | Identifier of the voice whose settings are to be modified. |
stability | number | No | Controls voice consistency and randomness between generations (0.0–1.0). Lower values introduce broader emotional range. |
similarity_boost | number | No | Determines how closely the AI adheres to the original voice (0.0–1.0). |
style | number | No | Adjusts style exaggeration and expressiveness (0.0–1.0). Available for V2+ models. |
speed | number | No | Controls speech rate and pacing (typically 0.25–4.0). |
use_speaker_boost | boolean | No | Boosts similarity to the original speaker. Not available for the Eleven v3 model. |
Output
Section titled “Output”| Name | Type | Required | Description |
|---|---|---|---|
data | string | Yes | Data from the action execution. |
error | string | No | Error if any occurred during the execution of the action. |
successful | boolean | Yes | Whether or not the action execution was successful. |
Delete voice by id
Section titled “Delete voice by id”ELEVENLABS_DELETE_VOICE
Permanently and irreversibly deletes a specific custom voice using its voice_id. The authenticated user must have permission to delete it.
Input parameters
Section titled “Input parameters”| Name | Type | Required | Description |
|---|---|---|---|
voice_id | string | Yes | The unique identifier of the voice to be deleted. |
Output
Section titled “Output”| Name | Type | Required | Description |
|---|---|---|---|
data | string | Yes | Data from the action execution. |
error | string | No | Error if any occurred during the execution of the action. |
successful | boolean | Yes | Whether or not the action execution was successful. |
Get voices list
Section titled “Get voices list”ELEVENLABS_GET_VOICES
Retrieves a list of all available voices along with their detailed attributes and settings.
Output
Section titled “Output”| Name | Type | Required | Description |
|---|---|---|---|
data | string | Yes | Data from the action execution. |
error | string | No | Error if any occurred during the execution of the action. |
successful | boolean | Yes | Whether or not the action execution was successful. |
Get voice
Section titled “Get voice”ELEVENLABS_GET_VOICE
Retrieves comprehensive details for a specific, existing voice by its voice_id, optionally including its settings.
Input parameters
Section titled “Input parameters”| Name | Type | Required | Description |
|---|---|---|---|
voice_id | string | Yes | Identifier of the voice. Use GET /v1/voices to list available IDs. |
with_settings | boolean | No | If true, the response will include detailed settings information for the voice. |
Output
Section titled “Output”| Name | Type | Required | Description |
|---|---|---|---|
data | string | Yes | Data from the action execution. |
error | string | No | Error if any occurred during the execution of the action. |
successful | boolean | Yes | Whether or not the action execution was successful. |
Get shared voices
Section titled “Get shared voices”ELEVENLABS_GET_SHARED_VOICES
Retrieves a paginated and filterable list of shared voices from the ElevenLabs Voice Library.
Input parameters
Section titled “Input parameters”| Name | Type | Required | Description |
|---|---|---|---|
search | string | No | A search term to filter voices by name or description. |
language | string | No | Filters voices by language (ISO 639-1 code). |
gender | string | No | Filters voices by gender. |
accent | string | No | Filters voices by accent. |
age | string | No | Filters voices by age group. |
category | string | No | Filters voices by category. |
featured | boolean | No | Filters for voices that are marked as featured. |
use_cases | array | No | Filters voices by their intended use cases. |
page | integer | No | Page number for pagination, starting from 0. |
page_size | integer | No | Maximum number of shared voices per page (max 100). |
sort | string | No | Sort options: created_date, usage_character_count_1y, trending, cloned_by_count. |
Output
Section titled “Output”| Name | Type | Required | Description |
|---|---|---|---|
data | string | Yes | Data from the action execution. |
error | string | No | Error if any occurred during the execution of the action. |
successful | boolean | Yes | Whether or not the action execution was successful. |
Generate a random voice
Section titled “Generate a random voice”ELEVENLABS_GENERATE_A_RANDOM_VOICE
Generates a unique, random ElevenLabs text-to-speech voice based on input text and specified voice characteristics.
Input parameters
Section titled “Input parameters”| Name | Type | Required | Description |
|---|---|---|---|
text | string | Yes | The text to be synthesized. Length must be between 100 and 1,000 characters. |
gender | string | Yes | Gender of the generated voice: female or male. |
age | string | Yes | Age category of the generated voice: young, middle_aged, or old. |
accent | string | Yes | Accent of the generated voice: american, british, african, australian, or indian. |
accent_strength | number | Yes | Controls the strength of the accent (0.3–2.0). |
Output
Section titled “Output”| Name | Type | Required | Description |
|---|---|---|---|
data | string | Yes | Data from the action execution. |
error | string | No | Error if any occurred during the execution of the action. |
successful | boolean | Yes | Whether or not the action execution was successful. |
Dub a video or an audio file
Section titled “Dub a video or an audio file”ELEVENLABS_DUB_A_VIDEO_OR_AN_AUDIO_FILE
Dub a video or audio file into a specified target language, requiring file or source_url and target_lang. If mode is manual, csv_file is also required.
Input parameters
Section titled “Input parameters”| Name | Type | Required | Description |
|---|---|---|---|
target_lang | string | Yes | Required target language code for dubbing (e.g., en, es, pt). |
file | object | No | Video or audio file to dub. Required if source_url is not provided. |
source_url | string | No | URL of the video or audio file to dub. Required if file is not provided. |
source_lang | string | No | Language of the original audio. Use auto for automatic detection or provide a language code. |
name | string | No | Name for the dubbing project. |
mode | string | No | Dubbing mode: automatic for AI-driven dubbing or manual using a .csv file. |
num_speakers | integer | No | Number of speakers in the audio. Use 0 for automatic detection. |
watermark | boolean | No | Include a watermark in the dubbed audio. |
highest_resolution | boolean | No | Process dubbing at highest possible resolution; may increase processing time. |
dubbing_studio | boolean | No | Enable Dubbing Studio features for advanced editing capabilities. |
start_time | integer | No | Start time in seconds for the audio portion to dub. |
end_time | integer | No | End time in seconds for the audio portion to dub. |
Output
Section titled “Output”| Name | Type | Required | Description |
|---|---|---|---|
data | string | Yes | Data from the action execution. |
error | string | No | Error if any occurred during the execution of the action. |
successful | boolean | Yes | Whether or not the action execution was successful. |
Get generated items
Section titled “Get generated items”ELEVENLABS_GET_GENERATED_ITEMS
Retrieves metadata for a list of generated audio items from history, supporting pagination and optional filtering by voice ID.
Input parameters
Section titled “Input parameters”| Name | Type | Required | Description |
|---|---|---|---|
voice_id | string | No | Filters history items to only include those generated with the specified voice ID. |
page_size | integer | No | Maximum number of history items to return per page (1–1000). |
start_after_history_item_id | string | No | The ID of the history item to start fetching results after, for pagination. |
Output
Section titled “Output”| Name | Type | Required | Description |
|---|---|---|---|
data | string | Yes | Data from the action execution. |
error | string | No | Error if any occurred during the execution of the action. |
successful | boolean | Yes | Whether or not the action execution was successful. |
Download history items
Section titled “Download history items”ELEVENLABS_DOWNLOAD_HISTORY_ITEMS
Downloads audio clips from history by ID(s), returning a single file or a ZIP archive, with an optional output format (e.g., wav).
Input parameters
Section titled “Input parameters”| Name | Type | Required | Description |
|---|---|---|---|
history_item_ids | array | Yes | A list of unique string identifiers for the history items to be downloaded. |
output_format | string | No | Optional output audio format. Accepts wav to convert to WAV. If omitted, returns in original synthesized format. |
Output
Section titled “Output”| Name | Type | Required | Description |
|---|---|---|---|
data | string | Yes | Data from the action execution. |
error | string | No | Error if any occurred during the execution of the action. |
successful | boolean | Yes | Whether or not the action execution was successful. |
Get models
Section titled “Get models”ELEVENLABS_GET_MODELS
Retrieves a detailed list of all available ElevenLabs text-to-speech (TTS) models and their capabilities.
Output
Section titled “Output”| Name | Type | Required | Description |
|---|---|---|---|
data | string | Yes | Data from the action execution. |
error | string | No | Error if any occurred during the execution of the action. |
successful | boolean | Yes | Whether or not the action execution was successful. |
Create Conversational AI Agent
Section titled “Create Conversational AI Agent”ELEVENLABS_CREATE_CONVERSATIONAL_AGENT
Creates a new ElevenLabs Conversational AI agent with specified configuration. After creating the agent, you can chain other tools to attach phone numbers or configure additional settings.
Input parameters
Section titled “Input parameters”| Name | Type | Required | Description |
|---|---|---|---|
conversation_config | object | Yes | Configuration object defining the agent’s conversational behavior, including prompt, LLM model, language, and first message. |
name | string | No | Human-readable name for the agent. |
tags | array | No | List of tags to organize and categorize the agent. |
workflow | object | No | Workflow configuration defining conditional logic and tool execution modes. |
platform_settings | string | No | Platform-specific settings including evaluation criteria and widget configuration. |
Output
Section titled “Output”| Name | Type | Required | Description |
|---|---|---|---|
data | string | Yes | Data from the action execution. |
error | string | No | Error if any occurred during the execution of the action. |
successful | boolean | Yes | Whether or not the action execution was successful. |
Update Conversational AI Agent
Section titled “Update Conversational AI Agent”ELEVENLABS_UPDATE_CONVAI_AGENT
Updates an existing ElevenLabs Conversational AI agent’s settings, such as name, conversation configuration, workflow, or platform settings.
Input parameters
Section titled “Input parameters”| Name | Type | Required | Description |
|---|---|---|---|
agent_id | string | Yes | The ID of the agent to update. |
name | string | No | A name to make the agent easier to find. |
tags | array | No | Tags to help classify and filter the agent. |
conversation_config | object | No | Conversation configuration for the agent. |
workflow | object | No | Workflow configuration. |
platform_settings | object | No | Platform settings for the agent. |
version_description | string | No | Description for this version when publishing changes (only for versioned agents). |
Output
Section titled “Output”| Name | Type | Required | Description |
|---|---|---|---|
data | string | Yes | Data from the action execution. |
error | string | No | Error if any occurred during the execution of the action. |
successful | boolean | Yes | Whether or not the action execution was successful. |
Simulate Conversational AI Agent Conversation
Section titled “Simulate Conversational AI Agent Conversation”ELEVENLABS_SIMULATE_CONVAI_AGENTS_SIMULATE_CONVERSATION
Runs a simulated conversation between an agent and an AI user. Returns a full transcript with analysis including success metrics and a conversation summary.
Input parameters
Section titled “Input parameters”| Name | Type | Required | Description |
|---|---|---|---|
agent_id | string | Yes | The ID of the agent to simulate. |
simulation_specification | object | Yes | A specification used to simulate a conversation between an agent and an AI user. |
new_turns_limit | integer | No | Maximum number of new turns to generate in the conversation simulation. |
extra_evaluation_criteria | array | No | A list of additional evaluation criteria to test. |
Output
Section titled “Output”| Name | Type | Required | Description |
|---|---|---|---|
data | string | Yes | Data from the action execution. |
error | string | No | Error if any occurred during the execution of the action. |
successful | boolean | Yes | Whether or not the action execution was successful. |
Outbound call
Section titled “Outbound call”ELEVENLABS_OUTBOUND_CALL
Places an outbound call via SIP trunk. Requires an API key with Conversational AI permissions enabled and a valid SIP trunk phone number configured for outbound calls.
Input parameters
Section titled “Input parameters”| Name | Type | Required | Description |
|---|---|---|---|
agent_id | string | Yes | Agent ID to place the call with. |
to_number | string | Yes | Destination phone number in E.164 format. |
agent_phone_number_id | string | Yes | ID of the phone number to originate the call from (must support outbound calls). |
conversation_initiation_client_data | object | No | Personalization and override payload for initiating the conversation. |
Output
Section titled “Output”| Name | Type | Required | Description |
|---|---|---|---|
data | string | Yes | Data from the action execution. |
error | string | No | Error if any occurred during the execution of the action. |
successful | boolean | Yes | Whether or not the action execution was successful. |
Register Twilio Call for ConvAI Agent
Section titled “Register Twilio Call for ConvAI Agent”ELEVENLABS_REGISTER_CALL_CONVAI_TWILIO
Registers a Twilio call and returns TwiML to connect the call to an ElevenLabs Conversational AI agent. Use when integrating ElevenLabs agents with your own Twilio infrastructure for inbound or outbound calls.
Input parameters
Section titled “Input parameters”| Name | Type | Required | Description |
|---|---|---|---|
agent_id | string | Yes | The unique identifier of the Conversational AI agent to use for this call. |
to_number | string | Yes | The phone number the call is directed to in E.164 format. |
from_number | string | Yes | The phone number the call is originating from in E.164 format. |
direction | string | No | Direction of the call: inbound for incoming calls or outbound for outgoing calls. |
Output
Section titled “Output”| Name | Type | Required | Description |
|---|---|---|---|
data | string | Yes | Data from the action execution. |
error | string | No | Error if any occurred during the execution of the action. |
successful | boolean | Yes | Whether or not the action execution was successful. |
Get user info
Section titled “Get user info”ELEVENLABS_GET_USER_INFO
Retrieves detailed information about the authenticated ElevenLabs user’s account, including subscription, usage, API key, and status.
Output
Section titled “Output”| Name | Type | Required | Description |
|---|---|---|---|
data | string | Yes | Data from the action execution. |
error | string | No | Error if any occurred during the execution of the action. |
successful | boolean | Yes | Whether or not the action execution was successful. |
Get user subscription info
Section titled “Get user subscription info”ELEVENLABS_GET_USER_SUBSCRIPTION_INFO
Retrieves detailed subscription information for the currently authenticated ElevenLabs user.
Output
Section titled “Output”| Name | Type | Required | Description |
|---|---|---|---|
data | string | Yes | Data from the action execution. |
error | string | No | Error if any occurred during the execution of the action. |
successful | boolean | Yes | Whether or not the action execution was successful. |
Get voice settings
Section titled “Get voice settings”ELEVENLABS_GET_VOICE_SETTINGS
Retrieves the stability, similarity, style, and speaker boost settings for a specific, existing ElevenLabs voice using its voice_id.
Input parameters
Section titled “Input parameters”| Name | Type | Required | Description |
|---|---|---|---|
voice_id | string | Yes | Identifier of the voice for which to retrieve settings. |
Output
Section titled “Output”| Name | Type | Required | Description |
|---|---|---|---|
data | string | Yes | Data from the action execution. |
error | string | No | Error if any occurred during the execution of the action. |
successful | boolean | Yes | Whether or not the action execution was successful. |
Generate Music Composition Plan
Section titled “Generate Music Composition Plan”ELEVENLABS_CREATE_MUSIC_PLAN
Generates a music composition plan from a text prompt using the ElevenLabs Music API. Creates a structured plan with defined styles, sections, and durations that can be used as input for actual music generation or as a template for variations.
Output
Section titled “Output”| Name | Type | Required | Description |
|---|---|---|---|
data | string | Yes | Data from the action execution. |
error | string | No | Error if any occurred during the execution of the action. |
successful | boolean | Yes | Whether or not the action execution was successful. |