Philipp Schmid comments

Results 136 comments of


                                            Philipp Schmid

(Chat)Completion objects cannot generate diverse outputs

`easyllm` is using the `huggingface_hub` library. I talked to @Wauplin. At the moment it is not possible to deactivate the cache when using the `InferenceClient`. A workaround would be if...

(Chat)Completion objects cannot generate diverse outputs

Another workaround could be that we add a `seed` argument when sending the multiple requests this should lead to none cached outputs. @KoutchemeCharles could you try this ? You would...

(Chat)Completion objects cannot generate diverse outputs

Can open a PR with that change?

Does this work for Llama2 - Fine-tune Falcon 180B with DeepSpeed ZeRO, LoRA & Flash Attention?

Seems to be an hardware and environment issue unrelated to the code. I used cuda 11.8

Does this work for Llama2 - Fine-tune Falcon 180B with DeepSpeed ZeRO, LoRA & Flash Attention?

Does the example without code changes work?

Does this work for Llama2 - Fine-tune Falcon 180B with DeepSpeed ZeRO, LoRA & Flash Attention?

What change did you make?

Does this work for Llama2 - Fine-tune Falcon 180B with DeepSpeed ZeRO, LoRA & Flash Attention?

did you make changes to the flash attention patch? The example only works with falcon since it has a custom patch to use flash attention.

Does this work for Llama2 - Fine-tune Falcon 180B with DeepSpeed ZeRO, LoRA & Flash Attention?

Yes! 👍🏻 Plan to update all my posts and remove that patches once there is an official release.

CPU offload when not using offload deepspeed config file

Can you share the code you use? Do you only want to do inference? What hardware do you have available?

gcc/cuda used for training

Sure. ```bash >> nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_Jun__8_16:49:14_PDT_2022 Cuda compilation tools, release 11.7, V11.7.99 Build cuda_11.7.r11.7/compiler.31442593_0 ``` and ```bash...