Stas Bekman comments

Results 664 comments of


                                            Stas Bekman

issue with loading pretrained model using DeepSpeed Zero Stage 3

sounds like a potential problem with pt-nightly? It works just fine on pt-1.11 - this is adapted to use the files from repo directly: ``` torchrun --nproc_per_node=2 examples/pytorch/text-classification/run_glue.py \ --task_name...

issue with loading pretrained model using DeepSpeed Zero Stage 3

pt-nightly works just fine I get a very nice learning curve: ``` [INFO|trainer.py:1428] 2022-05-18 17:56:02,223 >> ***** Running training ***** [INFO|trainer.py:1429] 2022-05-18 17:56:02,224 >> Num examples = 3668 [INFO|trainer.py:1430] 2022-05-18...

issue with loading pretrained model using DeepSpeed Zero Stage 3

the main deepspeed config difference is: ``` - "type": "WarmupDecayLR", + "type": "WarmupLR", ``` but it shouldn't cause an issue with the pre-trained weights. I wonder why you see a...

issue with loading pretrained model using DeepSpeed Zero Stage 3

Great to hear you found the cause. In general when you use deepspeed ZeRO stage-3 and you see a shape that's of size 0, it's because the weights are sharded...

issue with loading pretrained model using DeepSpeed Zero Stage 3

Please give me a full setup that I can reproduce your issue with and I will try to come up with a solution. And also if you write your own...

issue with loading pretrained model using DeepSpeed Zero Stage 3

Thank you, @pacman100 Please try this PR https://github.com/huggingface/transformers/pull/17373

With dataloader RSS memory consumed by HF datasets monotonically increases

Are you sure there is a leak? How can I see it? You shared the script but not the output which you believe should indicate a leak. I modified your...

With dataloader RSS memory consumed by HF datasets monotonically increases

Unless of course you're referring the memory growth during the first try. Is that what you're referring to? And since your ds is small it's hard to see the growth...

With dataloader RSS memory consumed by HF datasets monotonically increases

I was able to reproduce the leak with: ``` import psutil import os import gc from datasets import load_from_disk import time DATASET_PATH = "/hf/m4-master/data/cm4/cm4-10000-v0.1" dataset = load_from_disk(DATASET_PATH) # truncate to...

With dataloader RSS memory consumed by HF datasets monotonically increases

This issue has nothing to do with `PIL`'s decoder. I removed it and the problem is still there. I then traced this leak to this single call: `pa_table.to_pydict()` here: https://github.com/huggingface/datasets/blob/08a7b389cdd6fb49264a72aa8ccfc49a233494b6/src/datasets/formatting/formatting.py#L138-L140...