Alexandre Strube comments

Results 193 comments of


                                            Alexandre Strube

[Questions] Where can I find the delta weights automatically download?

As @tarangill said, you have `~/.cache/huggingface/hub` where the models end up. I will close this issue as it's pretty old and I think you found your models by now :-)

I have some questions, such as how to create a public link that can open the dialog interface in your browser

> --share It doesn't work for the gradio_web_server.py: ``` 2023-06-29 12:03:00 | INFO | gradio_web_server | args: Namespace(host='0.0.0.0', port=None, share=True, controller_url='http://localhost:21001', concurrency_count=10, model_list_mode='reload', moderate=False, add_chatgpt=False, add_claude=False, add_palm=False, gradio_auth_path=None) 2023-06-29 12:03:00...

I have some questions, such as how to create a public link that can open the dialog interface in your browser

It’s a problem on gradio. We have to post on their repository.

I have some questions, such as how to create a public link that can open the dialog interface in your browser

Exactly. That's a problem on Gradio, and should be reported there.

impossible to load vicuna-13B-1.1-GPTQ-4bit-128g

As the OP moved on, I will close this one. If anyone feels like this is not a good solution, please reopen.

Failed to set multiple gpus

@Halflifefa this has to do with the model you are using. The model "spills" from one gpu when the memory is full to the next. If you use a LLaMa2-70,...

ConnectionError when launch the model worker(s)

How do you run the controller?

Different worker with different models don't update the web interface

So, this works on the `fastchat.serve.gradio_web_server_multi` (provided you restart the server), but it does not on the `fastchat.serve.gradio_web_server` - which makes the model selection tab on the web_server moot.

Different worker with different models don't update the web interface

Ok, this now works: `--model-list-mode=reload`

[Usage] How to inference with multi-GPUs in single machine? Possible to do batch inference?

Same for me: ```bash export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 $FASTCHAT/fastchat/serve/model_worker.py \ --controller $FASTCHAT_CONTROLLER:$FASTCHAT_CONTROLLER_PORT \ --port 31029 \ --worker http://$(hostname):31029 \ --num-gpus 8 \ --model-path models/Mixtral-8x22B-v0.1 ``` vLLM also works multi-gpu just fine....