ezlocalai icon indicating copy to clipboard operation
ezlocalai copied to clipboard

[WIP] Add create_audiobook function

Open Josh-XT opened this issue 1 year ago • 0 comments

Added create_audiobook function.

  • API Endpoint added for POST /v1/audio/book
    • Accepts file upload
    • Accepts voice and language (2 letter) in the body optionally. Will use default voice and en for the language if not defined.
    • Narrator voice is used from voice
    • If language is defined, it will translate the input to the desired language and output text and audio in the desired language.
  • Still need to add 100x male and female voices for random selection. Will synthesize these so that they're not real peoples voices.
graph TD
    A[Start] --> B[Chunk book content]
    B --> C{Paragraph > 2000 tokens?}
    C -->|No| D[Add paragraph to chunk]
    C -->|Yes| E[Split paragraph into sentences]
    E --> F[Group sentences up to 2000 tokens]
    F --> D
    D --> G[Process each chunk]
    G --> H{Extract characters, dialogue, and narration}
    H --> I[Merge similar characters]
    I --> J{Translation requested?}
    J -->|Yes| K[Translate content]
    J -->|No| L[Assign voices to characters]
    K --> L
    L --> M[Generate audio for each content item]
    M --> N[Combine audio segments]
    N --> O[Export final audiobook]
    O --> P[Save text output]
    P --> Q[End]

    subgraph "Chunking process"
    B
    C
    D
    E
    F
    end

    subgraph "For each chunk"
    G
    H
    I
    end

    subgraph "For each content item"
    M
    end

Josh-XT avatar Oct 05 '24 13:10 Josh-XT