elevenlabs-python icon indicating copy to clipboard operation
elevenlabs-python copied to clipboard

confusion over the API and model ID options

Open dskill opened this issue 1 year ago • 4 comments

I'm trying to use the elevenlabs python library with stream(), and it works fine with eleven_monolingual_v1 but fails with eleven_monolingual_v2. However I can't find anything in the documentation that clarifies what models are available in streaming mode.

dskill avatar Oct 13 '23 03:10 dskill

same issue, did you solve it

mushanwei avatar Oct 22 '23 12:10 mushanwei

Nope :(. Would love to know if streaming supports v2.

dskill avatar Oct 22 '23 16:10 dskill

Dudes, for v2 the id is eleven_multilingual_v2 . It's quite frustrating that that information is NOWHERE to be found and I literally had to guess.

lhyphendixon avatar Dec 19 '23 20:12 lhyphendixon

It's incredibly frustrating and unorganized.. You'd think that with all the money they're making, they could afford to allocate someone to fix their Docs and professionally maintain this package. I get that its a new company and all, but cmon guys its been like 2 years. I swear I'm not trying to be difficult, it just really looks bad. And feels bad for me as a developer.

Anyway, if you're looking for the models I'd recommend just using this: https://elevenlabs.io/docs/api-reference/get-models Then copy paste them to a new JSON file or something.

Anyway, there is no 'eleven_monolingual_v2'. here are all the models as of now:

[
    {
        "model_id": "eleven_multilingual_v2",
        "name": "Eleven Multilingual v2",
        "can_be_finetuned": true,
        "can_do_text_to_speech": true,
        "can_do_voice_conversion": false,
        "can_use_style": true,
        "can_use_speaker_boost": true,
        "serves_pro_voices": false,
        "token_cost_factor": 1,
        "description": "Our state of the art multilingual speech synthesis model, able to generate life-like speech in 29 languages.",
        "requires_alpha_access": false,
        "max_characters_request_free_user": 2500,
        "max_characters_request_subscribed_user": 5000
    },
    {
        "model_id": "eleven_multilingual_v1",
        "name": "Eleven Multilingual v1",
        "can_be_finetuned": true,
        "can_do_text_to_speech": true,
        "can_do_voice_conversion": false,
        "can_use_style": false,
        "can_use_speaker_boost": false,
        "serves_pro_voices": false,
        "token_cost_factor": 1,
        "description": "Generate lifelike speech in multiple languages and create content that resonates with a broader audience.",
        "requires_alpha_access": false,
        "max_characters_request_free_user": 2500,
        "max_characters_request_subscribed_user": 5000
    },
    {
        "model_id": "eleven_monolingual_v1",
        "name": "Eleven English v1",
        "can_be_finetuned": true,
        "can_do_text_to_speech": true,
        "can_do_voice_conversion": false,
        "can_use_style": false,
        "can_use_speaker_boost": false,
        "serves_pro_voices": false,
        "token_cost_factor": 1,
        "description": "Use our standard English language model to generate speech in a variety of voices, styles and moods.",
        "requires_alpha_access": false,
        "max_characters_request_free_user": 2500,
        "max_characters_request_subscribed_user": 5000,
        "languages": [
            {
                "language_id": "en",
                "name": "English"
            }
        ]
    },
    {
        "model_id": "eleven_turbo_v2",
        "name": "Eleven Turbo v2",
        "can_be_finetuned": false,
        "can_do_text_to_speech": true,
        "can_do_voice_conversion": false,
        "can_use_style": false,
        "can_use_speaker_boost": false,
        "serves_pro_voices": false,
        "token_cost_factor": 1,
        "description": "Our cutting-edge turbo model is ideally suited for tasks demanding extremely low latency.",
        "requires_alpha_access": false,
        "max_characters_request_free_user": 2500,
        "max_characters_request_subscribed_user": 5000,
        "languages": [
            {
                "language_id": "en",
                "name": "English"
            }
        ]
    },
    {
        "model_id": "eleven_multilingual_sts_v2",
        "name": "Eleven Multilingual v2",
        "can_be_finetuned": true,
        "can_do_text_to_speech": false,
        "can_do_voice_conversion": true,
        "can_use_style": true,
        "can_use_speaker_boost": true,
        "serves_pro_voices": false,
        "token_cost_factor": 1,
        "description": "Our cutting-edge, multilingual speech-to-speech model is designed for situations that demand unparalleled control over both the content and the prosody of the generated speech across various languages.",
        "requires_alpha_access": false,
        "max_characters_request_free_user": 2500,
        "max_characters_request_subscribed_user": 5000,
    },
    {
        "model_id": "eleven_english_sts_v2",
        "name": "Eleven English v2",
        "can_be_finetuned": true,
        "can_do_text_to_speech": false,
        "can_do_voice_conversion": true,
        "can_use_style": true,
        "can_use_speaker_boost": true,
        "serves_pro_voices": false,
        "token_cost_factor": 1,
        "description": "Our state-of-the-art speech to speech model suitable for scenarios where you need maximum control over the content and prosody of your generations.",
        "requires_alpha_access": false,
        "max_characters_request_free_user": 2500,
        "max_characters_request_subscribed_user": 5000,
        "languages": [
            {
                "language_id": "en",
                "name": "English"
            }
        ]
    }
]

Joshua-Shepherd avatar Mar 16 '24 02:03 Joshua-Shepherd