Hi guys,

First of all, thank you for the amazing job you do.

I didn't find API for Text-To-Speech. The workflow can be used for this I think, but are there any plans to implement it on API?

Kind regards, /Andriy

Sep 11 '23 06:09 semack

Thank you for the issue.

The plan moving forward was to push running pipelines through workflows instead of direct when using the API.

Sep 11 '23 16:09 davidmezzetti

Upon further review, there are only a few that aren't in the API and it makes sense to have the routers. I've been pushing things more to workflows but it doesn't hurt to have pipelines, especially in the case of a LLM pipeline.

Sep 25 '23 12:09 davidmezzetti

Another thing I've faced - in my setup txtxai is hosted in a separate remote environment with a powerful GPU and my custom software needs it to be used remotely using the API. Some pipelines like Textraction and Transcription need to have a file name as an argument. The Textraction from remote sources works well, but Transcription doesn't. Could it be fixed?

Sep 26 '23 08:09 semack

The pipelines are focused on a single task by design. That's where workflows come in. There are workflow steps for reading from URLs and cloud object storage.

Sep 26 '23 11:09 davidmezzetti

Hi David,

Thank you for pointing me out, the retrieve task helped me, transcription works well. I am now having another problem with workflow while I'm trying to make tts get to work in a docker container.

docker-compose file

version: '3.4' services: txtai-api: build: context: . dockerfile: txtai-api.Dockerfile ports: - 8000:8000 volumes: - ./app.yml:/app/app.yaml:ro - ./.cache:/models environment: - CONFIG=/app/app.yaml - TRANSFORMERS_CACHE=/models #command: python -c "import tensorflow as tf;tf.test.gpu_device_name()" deploy: resources: reservations: devices: - driver: nvidia device_ids: ['0'] capabilities: [gpu]

txtai-api.Dockerfile

Set base image

ARG BASE_IMAGE=neuml/txtai-gpu:latest FROM $BASE_IMAGE

Start server and listen on all interfaces

ENTRYPOINT ["uvicorn", "--host", "0.0.0.0", "txtai.api:app"]

app.yml

Index file path

path: /tmp/index

Allow indexing of documents

writable: True

Enbeddings index

embeddings: path: sentence-transformers/nli-mpnet-base-v2

Extractive QA

extractor: path: distilbert-base-cased-distilled-squad

Zero-shot labeling

labels:

Similarity

similarity:

Text segmentation

segmentation: sentences: true

Text summarization

summary:

Text extraction

textractor: join: true lines: false minlength: 100 paragraphs: true sentences: false

Transcribe audio to text

transcription:

#Text To Speech texttospeech:

Translate text between languages

translation:

Workflow definitions

workflow: sumfrench: tasks: - action: textractor task: url - action: summary - action: translation args: ["fr"] sumspanish: tasks: - action: textractor task: url - action: summary - action: translation args: ["es"] tts: tasks: - action: texttospeech stt: tasks: - task: retrieve - action: transcription

There is my call in C#, sorry not Python, but I showed it for understanding context.

        public async Task<TextToSpeechResponse> Handle(TextToSpeechCommand request, CancellationToken cancellationToken)
        {
            var wf = new Workflow(_settings.BaseUrl);
            
            var elements = new List<string>()
            {
                { request.Text }
            };
            
            var data = await wf.WorkflowActionAsync("tts", elements);
            
            var result = new TextToSpeechResponse
            {
                Binary = (byte[])data.FirstOrDefault()
            };
            
            return result;
        }
    }

Logs from the container

root@debian-AI:/opt/docker/txtai# docker compose up [+] Running 2/1 ✔ Network txtai_default Created 0.1s ✔ Container txtai-txtai-api-1 Created 0.0s Attaching to txtai-txtai-api-1 txtai-txtai-api-1 | [nltk_data] Downloading package averaged_perceptron_tagger to txtai-txtai-api-1 | [nltk_data] /root/nltk_data... txtai-txtai-api-1 | [nltk_data] Unzipping taggers/averaged_perceptron_tagger.zip. txtai-txtai-api-1 | [nltk_data] Downloading package cmudict to /root/nltk_data... txtai-txtai-api-1 | [nltk_data] Unzipping corpora/cmudict.zip. txtai-txtai-api-1 | INFO: Started server process [1] txtai-txtai-api-1 | INFO: Waiting for application startup. txtai-txtai-api-1 | No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli). txtai-txtai-api-1 | Using a pipeline without specifying a model name and revision in production is not recommended. txtai-txtai-api-1 | No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6). txtai-txtai-api-1 | Using a pipeline without specifying a model name and revision in production is not recommended. Downloading (…)lve/main/config.yaml: 100%|██████████| 1.10k/1.10k [00:00<00:00, 540kB/s] Downloading model.onnx: 100%|██████████| 133M/133M [00:02<00:00, 48.3MB/s] txtai-txtai-api-1 | No model was supplied, defaulted to facebook/wav2vec2-base-960h and revision 55bb623 (https://huggingface.co/facebook/wav2vec2-base-960h). txtai-txtai-api-1 | Using a pipeline without specifying a model name and revision in production is not recommended. txtai-txtai-api-1 | Some weights of Wav2Vec2ForCTC were not initialized from the model checkpoint at facebook/wav2vec2-base-960h and are newly initialized: ['wav2vec2.masked_spec_embed'] txtai-txtai-api-1 | You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. txtai-txtai-api-1 | INFO: Application startup complete. txtai-txtai-api-1 | INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit) txtai-txtai-api-1 | INFO: 10.20.255.4:54510 - "POST /workflow HTTP/1.1" 500 Internal Server Error txtai-txtai-api-1 | ERROR: Exception in ASGI application txtai-txtai-api-1 | Traceback (most recent call last): txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/fastapi/encoders.py", line 230, in jsonable_encoder txtai-txtai-api-1 | data = dict(obj) txtai-txtai-api-1 | TypeError: cannot convert dictionary update sequence element #0 to a sequence txtai-txtai-api-1 | txtai-txtai-api-1 | During handling of the above exception, another exception occurred: txtai-txtai-api-1 | txtai-txtai-api-1 | Traceback (most recent call last): txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/fastapi/encoders.py", line 235, in jsonable_encoder txtai-txtai-api-1 | data = vars(obj) txtai-txtai-api-1 | TypeError: vars() argument must have dict attribute txtai-txtai-api-1 | txtai-txtai-api-1 | The above exception was the direct cause of the following exception: txtai-txtai-api-1 | txtai-txtai-api-1 | Traceback (most recent call last): txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/uvicorn/protocols/http/h11_impl.py", line 408, in run_asgi txtai-txtai-api-1 | result = await app( # type: ignore[func-returns-value] txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/uvicorn/middleware/proxy_headers.py", line 84, in call txtai-txtai-api-1 | return await self.app(scope, receive, send) txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/fastapi/applications.py", line 292, in call txtai-txtai-api-1 | await super().call(scope, receive, send) txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/starlette/applications.py", line 122, in call txtai-txtai-api-1 | await self.middleware_stack(scope, receive, send) txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/errors.py", line 184, in call txtai-txtai-api-1 | raise exc txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/errors.py", line 162, in call txtai-txtai-api-1 | await self.app(scope, receive, _send) txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/exceptions.py", line 79, in call txtai-txtai-api-1 | raise exc txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/exceptions.py", line 68, in call txtai-txtai-api-1 | await self.app(scope, receive, sender) txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/fastapi/middleware/asyncexitstack.py", line 20, in call txtai-txtai-api-1 | raise e txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/fastapi/middleware/asyncexitstack.py", line 17, in call txtai-txtai-api-1 | await self.app(scope, receive, send) txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 718, in call txtai-txtai-api-1 | await route.handle(scope, receive, send) txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 276, in handle txtai-txtai-api-1 | await self.app(scope, receive, send) txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 66, in app txtai-txtai-api-1 | response = await func(request) txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/fastapi/routing.py", line 291, in app txtai-txtai-api-1 | content = await serialize_response( txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/fastapi/routing.py", line 179, in serialize_response txtai-txtai-api-1 | return jsonable_encoder(response_content) txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/fastapi/encoders.py", line 209, in jsonable_encoder txtai-txtai-api-1 | jsonable_encoder( txtai-txtai-api-1 | File "/usr/local/lib/python3.8/dist-packages/fastapi/encoders.py", line 238, in jsonable_encoder txtai-txtai-api-1 | raise ValueError(errors) from e txtai-txtai-api-1 | ValueError: [TypeError('cannot convert dictionary update sequence element #0 to a sequence'), TypeError('vars() argument must have dict attribute')]

Could you help me figure out the problem please? I feel that there is something missing.

Thank you in advance, Andriy

Oct 10 '23 14:10 semack

When I'm using curl there is the same.

curl   -X POST "http://localhost:8000/workflow"   -H "Content-Type: application/json"   -d '{"name":"tts", "elements":["Say something here"]}'

I figured out that the problem on filling responses on the server when using tts.

Oct 11 '23 11:10 semack

I'll have to look at this closer but it seems like it might be an issue with returning binary data as JSON.

Oct 11 '23 14:10 davidmezzetti

Yes, I have the same suspicion.

Oct 11 '23 16:10 semack

Well instead of binary, I should say NumPy arrays which are what are returned.

You can add your own custom pipeline that converts the waveforms to Python floats which are JSON serializable.

class Converter:
    def __call__(self, inputs):
        return [x.tolist() for x in inputs]

Or perhaps something that even writes it to a WAV file then base64 encodes that data like what's in this notebook - https://github.com/neuml/txtai/blob/master/examples/40_Text_to_Speech_Generation.ipynb

Ultimately, I think having options to write to WAV/base64 encode could be good options to add to the TTS pipeline.

Oct 11 '23 17:10 davidmezzetti

Ultimately, I think having options to write to WAV/base64 encode could be good options to add to the TTS pipeline.

This could be the best solution IMHO. Also, it could be a Task I guess. Thanks.

Oct 12 '23 06:10 semack

txtai
txtai copied to clipboard

Add missing pipelines to API

Set base image

Start server and listen on all interfaces

Index file path

Allow indexing of documents

Enbeddings index

Extractive QA

Zero-shot labeling

Similarity

Text segmentation

Text summarization

Text extraction

Transcribe audio to text

Translate text between languages

Workflow definitions

txtai txtai copied to clipboard

Add missing pipelines to API

Set base image

Start server and listen on all interfaces

Index file path

Allow indexing of documents

Enbeddings index

Extractive QA

Zero-shot labeling

Similarity

Text segmentation

Text summarization

Text extraction

Transcribe audio to text

Translate text between languages

Workflow definitions

txtai
txtai copied to clipboard