Stas Bekman comments

Results 664 comments of


                                            Stas Bekman

Fine-tuning GPT-J-6B in colab: 8-bit weights with low-rank adaptors

I also wonder whether the policy should be arch-specific, or model-specific - what if someone wants to do 8-bit only for FFN or only for Embedding? If model-specific than the...

Fine-tuning GPT-J-6B in colab: 8-bit weights with low-rank adaptors

> What I would add is what kind of int8 data type is used. Did you mean to say something different here, Tim? Unless I misunderstood, int8 is already a...

Fine-tuning GPT-J-6B in colab: 8-bit weights with low-rank adaptors

Sounds good, Tim. So I trust you will come up with the different names then. We just need to think how to make it easily expandable in the future to...

Fine-tuning GPT-J-6B in colab: 8-bit weights with low-rank adaptors

> I've [implemented](https://github.com/deniskamazur/transformers/tree/gpt-j-8bit) the «hardcoded» version of this issue. Awesome news, @deniskamazur! I won't have time at this moment to support this process very closely but I trust there will...

Fine-tuning GPT-J-6B in colab: 8-bit weights with low-rank adaptors

Hi Denis, it has been a long time.... perhaps there has been a misunderstanding - as we have been waiting for you to complete the PR so nothing has happened...

Fine-tuning GPT-J-6B in colab: 8-bit weights with low-rank adaptors

I suppose the advantage of loading in int8, is that with fp16 you need 2x memory upfront, but since we now have sharded checkpoints this can be overcome by sharding...

integrate `load_from_disk` into `load_dataset`

Thank you for the detailed breakdown, @lhoestq > I'm curious, what would you expect to happen in this situation ? 1. the simplest solution is to add a flag to...

integrate `load_from_disk` into `load_dataset`

Yes, so that you always have the cached entry for any dataset, but the "payload" doesn't have to be physically in the cache if it's already on the local filesystem....

Save and resume the state of a DataLoader

Your outline spec is very sound and clear, @lhoestq - thank you! @thomasw21, indeed that would be a wonderful extra feature. In Megatron-Deepspeed we manually drained the dataloader for the...

Disable extreme deepspeed logging

I totally agree, @iliaschalkidis! In general, pretty much any DeepSpeed-specific questions should go to https://github.com/microsoft/DeepSpeed - please feel free to tag me if it's related to `transformers` though, since most...