Nicolas Patry comments

Results 978 comments of


                                            Nicolas Patry

[Documentation] Unclear how to use other architectures

This model seems to be sharing it's gate_proj, however the modeling code doesn't reflect that: https://huggingface.co/baichuan-inc/baichuan-7B/blob/main/modeling_baichuan.py Not sure if it's intentional.

Print error logs from launcher during integration tests

Hey @Atry thanks for the contribution. Do you mind sharing a bit more about the problem this is trying to solve ?

curious about the plans for supporting PEFT and LoRa.

Hey, do you know about https://huggingface.co/docs/peft/main/en/package_reference/tuners#peft.LoraModel.merge_and_unload Basically, you could ``` model = model.merge_and_unload() model.save_pretrained("mynewmergedmodel") ``` which will "write" the peft weights directly into the model, making it a regular transformer...

Support PagedAttention

Not in latency (Depends on the benchmark/hardware, but it is basically on par). PagedAttention seems to be nicer with respect to VRAM usage meaning it's better when you're low on...

Support PagedAttention

> In this scenario, do you think it makes sense to shard over 2 GPUs a model that can fit in a single GPU, paying the sharding latency price chasing...

feat(server): use encoding to get prefill tokens

Is this still relevant ?

feat(server): use encoding to get prefill tokens

Closing as stale.

Deploying fine tuned Falcon 7B model onto SageMaker yields download errors

Which model is it ? The tool is trying to convert the training parameters which are not convertible. We will just need to skip it.

Deploying fine tuned Falcon 7B model onto SageMaker yields download errors

Indeed there's a training file here: https://github.com/huggingface/text-generation-inference/pull/485

Falcon 40B Instruct not generating in stream mode.

Do you mind opening an issue directly in https://github.com/huggingface/chat-ui since this is what seems to be the issue ? We don't really know what's going on, but it seems that...