llama-gpt
llama-gpt copied to clipboard
replies / slow
how to configure it to show replies in real time and not wait for the end of the generation.
Thanks
Currently the replies are already streamed one word at a time. I wonder if the first word's taking a lot of time for you? In that case, consider running the 7B model (if you aren't already) to see increased performance.
@mayankchhabra I think this is a bug. I am having the same issue
with me he really waits for the generation to be finished to give him the text
@AndreiSva and @WEBELSYS can you please share which model you're trying, and the specs of your hardware (OS, CPU, RAM)?
for a few days at first he was doing word for word. but now he waits for the end. 7b - 13b ryzen 5800x3d 32gb ddr4
I am running on linux on a ryzen 7 3700X, 32 gigabytes of ram.
Everything's also running extremely slow. Here's a screen record of what a simple generation looks like. Screencast from 2023-08-18 10-31-58.webm
it took almost 3 minutes
About the same or worse with one word at a time, running 70B on EPYC 7502P, 128 GB RAM , Ubuntu 22.04.
Same for me. I'm using an old laptop (i7 4800 mq and 8gb of ram, ssd) and it's very very slow with the 7B model. I know the laptop is not powerful, but it should be able to give a simple reply... or not? Thank you
Speed is fine on my platform, but there are no streaming tokens. Using Ryzen Threadripper 2950X & 32G of RAM on Fedora.
I waited 10 minutes, but nothing happened. Never saw any output, but my CPU spiked at 100 %
If the connection is direct, the response is one word at a time, but if using nginx reverse proxy, it needs to wait until the whole sentence is generated.
By the way, after using CUDA acceleration, the generation speed has been significantly improved.
That's a great observation @Aincvy. For anyone facing this issue, can you please confirm if you're using LlamaGPT behind a reverse proxy, like nginx? If so, would be great if you can paste your proxy config, you might need to make some adjustments to make streaming work.
this is indeed it! directly linked IP:3000, much faster and streaming
@WEBELSYS @mayankchhabra can you share the config you used? I am using a basic proxy pass and it is showing the issues stated above:
` server { listen 80; server_name chat.randomprivateurl.com;
location / {
proxy_pass http://localhost:3000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-Proto $scheme;
}
listen 443 ssl; # managed by Certbot
ssl_certificate /etc/letsencrypt/live/chat.randomprivateurl.com/fullchain.pem>
ssl_certificate_key /etc/letsencrypt/live/chat.randomprivateurl.com/privkey.p>
ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot
}
`
I'm using nginx too, even with "server sent event" tweak in nginx config, it still does not work: https://stackoverflow.com/questions/13672743/eventsource-server-sent-events-through-nginx