jetson-containers icon indicating copy to clipboard operation
jetson-containers copied to clipboard

Unable to run llamaspeak Tutorial

Open bryanhughes opened this issue 1 month ago • 5 comments

I reinstalled the Riva Server and am running it following these steps:

https://github.com/dusty-nv/jetson-containers/tree/master/packages/audio/riva-client

It sort of works. I can run the Streaming ASR test, which mostly works (sometimes I have to run it twice). When I run the Streaming TTS example, it only works if I run it immediately following the Streaming ASR example. When I try to run the 'Loopback' example, it works intermittently and sometimes crashes.

Here is an example where there was voice for ## Hello, then I tried to say more things and the ASR works, but the TTS fails.

>> hello
## Hello. 
>> you're
>> yeah you're work
>> hey you're working
>> yeah you're working
## Yeah, you're working. 
Traceback (most recent call last):
  File "/opt/riva/python-clients/scripts/loopback.py", line 124, in <module>
    main()
  File "/opt/riva/python-clients/scripts/loopback.py", line 111, in main
    for tts_response in tts_responses:
  File "/usr/local/lib/python3.10/dist-packages/grpc/_channel.py", line 542, in __next__
    return self._next()
  File "/usr/local/lib/python3.10/dist-packages/grpc/_channel.py", line 968, in _next
    raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
	status = StatusCode.UNKNOWN
	details = "Error: Triton model failed during inference. Error message: Streaming timed out"
	debug_error_string = "UNKNOWN:Error received from peer  {created_time:"2024-05-09T20:16:16.575930953+00:00", grpc_status:2, grpc_message:"Error: Triton model failed during inference. Error message: Streaming timed out"}"

Riva is running:

$ bash riva_start.sh 
Starting Riva Speech Services. This may take several minutes depending on the number of models deployed.
Waiting for Riva server to load all models...retrying in 10 seconds
Riva server is ready...
Use this container terminal to run applications:
root@0ee61d47e6f2:/opt/riva# riva_streaming_asr_client --audio_file=/opt/riva/wav/en-US_sample.wav
I0509 20:01:51.194780   246 grpc.h:94] Using Insecure Server Credentials
Loading eval dataset...
filename: /opt/riva/wav/en-US_sample.wav
Done loading 1 files
what
what
what is
what is
what is
what is now tilde
what is natural
what is natural
what is natural
what is natural language
what is natural language
what is natural language
what is natural language processing
what is natural language processing
what is natural language processing
what is natural language processing
what is natural language processing
what is tural language processing
what is language processing
What is natural language processing? 
-----------------------------------------------------------
File: /opt/riva/wav/en-US_sample.wav

Final transcripts: 
0 : What is natural language processing? 

Timestamps: 
Word                                    Start (ms)      End (ms)        Confidence      

What                                    920             960             2.0793e-01      
is                                      1200            1240            5.4014e-01      
natural                                 1720            2080            1.5321e-01      
language                                2240            2600            8.5110e-01      
processing?                             2720            3200            1.0000e+00      


Audio processed: 4.0000e+00 sec.
-----------------------------------------------------------

Not printing latency statistics because the client is run without the --simulate_realtime option and/or the number of requests sent is not equal to number of requests received. To get latency statistics, run with --simulate_realtime and set the --chunk_duration_ms to be the same as the server chunk duration
Run time: 7.7584e-01 sec.
Total audio processed: 4.1520e+00 sec.
Throughput: 5.3516e+00 RTFX
root@0ee61d47e6f2:/opt/riva# riva_tts_client --voice_name=English-US.Female-1 \
                --text="Hello, this is a speech synthesizer." \
                --audio_file=/opt/riva/wav/output.wav
I0509 20:02:45.375542   348 grpc.h:94] Using Insecure Server Credentials
root@0ee61d47e6f2:/opt/riva#

When I try to run the actual llamaspeak Tutorial, this is what I get.

$ sudo ./jetson-containers run --env HUGGINGFACE_TOKEN=XXXXXXXXXXX \
  $(autotag nano_llm) \
  python3 -m nano_llm.agents.web_chat --api=mlc \
    --model meta-llama/Meta-Llama-3-8B-Instruct \
    --asr=whisper --tts=piper
bash: autotag: command not found
localuser:root being added to access control list
/mnt/nvme/git/jetson-containers/run.sh: line 52: /tmp/nv_jetson_model: Permission denied
+ docker run --runtime nvidia -it --rm --network host --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /mnt/nvme/git/jetson-containers/data:/data --device /dev/snd --device /dev/bus/usb -e DISPLAY=:1 -v /tmp/.X11-unix/:/tmp/.X11-unix -v /tmp/.docker.xauth:/tmp/.docker.xauth -e XAUTHORITY=/tmp/.docker.xauth --device /dev/video0 --device /dev/video1 --device /dev/video2 --device /dev/video3 --env HUGGINGFACE_TOKEN=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 python3 -m nano_llm.agents.web_chat --api=mlc --model meta-llama/Meta-Llama-3-8B-Instruct --asr=whisper --tts=piper
Unable to find image 'python3:latest' locally
docker: Error response from daemon: pull access denied for python3, repository does not exist or may require 'docker login': denied: requested access to the resource is denied.
See 'docker run --help'.

bryanhughes avatar May 09 '24 20:05 bryanhughes