Wei-Lin Chiang
Wei-Lin Chiang
@smarie Thanks for your detailed explanation. I understand your point now. What I suggested is that we can implement a stand-alone random number generator, which is the same as rand()...
@carandraug "exist ("OCTAVE_VERSION", "builtin")" will not work for Matlab, since double quotation is invalid in Matlab. I think it should be modified to "exist ('OCTAVE_VERSION', 'builtin')"? Thanks.
if you upgrade to master, does this still happen?
> a throughput of 0.2 tok/sec . @tacacs1101-debug this doesn't seem correct. can you provide commands to reproduce?
this command does not use vllm so it will be slow. ``` python3 -m fastchat.serve.cli --model-path WizardLM/WizardLM-70B-V1.0 --num-gpus 4 ``` you have to use vllm worker for better tensor parallelism...
thanks for reporting the issue. @merrymercy do you think we should pin `python >= 3.9`? https://github.com/lm-sys/FastChat/blob/ec9a07ed22110e9686b51fd6ee9bf635b7ce54f8/pyproject.toml#L10
@chentao169 I'd highly recommend you to checkout our leaderboard here including model's ELO, MT-bench, and MMLU score. https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard It's also on https://chat.lmsys.org
Closing this for now. feel free to open if you still find issue.
Hey @ericzhou571 thanks a lot for adding the falcon support! I'm trying to run the below command to test falcon-instruct: ``` python3 -m fastchat.serve.cli --model-path tiiuae/falcon-40b-instruct --num-gpus 2 --debug ```...
Sorry my bad. I didn't correctly apply the change for `falcon_generate_stream`. I've fixed it and now it's working. Great work again @ericzhou571 ! One thing may need to be fixed...