Hao Zhang comments

Results 174 comments of


                                            Hao Zhang

Should the inference code be improved by a sliding window?

@qZhang88 : Yes, your suggestion is reasonable and you can. Contributions are welcome.

GPTQ 4bit support

I'll take a look and try this PR later this week.

[Kernel] Use flash-attn for decoding

@sfc-gh-aqiao watch this thread -- which influence chunk-prefill performance

Issue:"Your setup doesn't support bf16/gpu. You need torch>=1.10, using Ampere GPU with cuda>=11.0"

yes, @sgsdxzy is right. Please re-open the issue if you still see it.

Question on Finetune

closing as this is not related with the development of the repo. Please try to figure out the appropriate hyperparameters on your own dataset. Some HPO is needed!

Add embedding API

@lan2720 We currently do not support APIs, cuz that will cause too much stress on our server. @lan2720

it is not very difficult to allow the model to output embeddings. Maybe improve [this part of code ](https://github.com/lm-sys/FastChat/blob/main/fastchat/serve/model_worker.py#L151) and expose a fastAPI endpoint to pass embeddings? Contributions are welcome.