Stas Bekman comments

Results 664 comments of


                                            Stas Bekman

Error building extension 'cpu_adam'

@arthur-morgan-712, you have a problem with your cuda environment: ``` /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/gemm_test.h:6:10: fatal error: cuda_profiler_api.h: No such file or directory #include ``` properly install the cuda environment, including all dev header...

Optimizer state loading fix for bitsandbytes 8-bit optimizers.

Logan, did Deepspeed add a feature to allow users to configure a list of param names not to be sharded? If I'm not mistaken, based on Tim's OP, this is...

[OPT] Adds `GPT2TokenizerFast` to the list of tokenizer to use for OPT.

a gentle ping here, as the m4 group needs to have all official opt models to support fast tokenizers. Thank you, @ArthurZucker!

[OPT] Adds `GPT2TokenizerFast` to the list of tokenizer to use for OPT.

we can also re-do the tiny tokenizers if they don't conform with the needs of the CI.

[OPT] Adds `GPT2TokenizerFast` to the list of tokenizer to use for OPT.

Thank you very much for taking care of this, Arthur!

Received invalid response

the fix offered by @mrelg is still needed to make this project work. Here is the patch version of the same as https://github.com/vaibhavk97/GoBooDo/issues/60#issuecomment-918563558 ``` diff --git a/GoBooDo.py b/GoBooDo.py index 0971a7d..2dab8f3...

Time to Say Goodbye, torch 1.7 and 1.8

ok, the deepspeed CI is running pt-1.8 - how do we solve that then? I have passed this change to the Deepspeed team let's see what they say. edit: they...

Fine-tuning GPT-J-6B in colab: 8-bit weights with low-rank adaptors

Just read the LORA paper and your implementation combined with weight quantization is very neat, @deniskamazur. Thank you! a few comments: 1. If and when this is integrated into transformers...

Fine-tuning GPT-J-6B in colab: 8-bit weights with low-rank adaptors

> If this is not possible i hope it will be easy to detect that a model is the 8-bit variant, so we can avoid executing half() on the model....

Fine-tuning GPT-J-6B in colab: 8-bit weights with low-rank adaptors

> * I'd like to discuss if we actually need LoRa adapters in the possible implementation. As I see it, they are not necessarily a part of the 8bit model....