Stas Bekman

Results 664 comments of Stas Bekman

sounds like a potential problem with pt-nightly? It works just fine on pt-1.11 - this is adapted to use the files from repo directly: ``` torchrun --nproc_per_node=2 examples/pytorch/text-classification/run_glue.py \ --task_name...

pt-nightly works just fine I get a very nice learning curve: ``` [INFO|trainer.py:1428] 2022-05-18 17:56:02,223 >> ***** Running training ***** [INFO|trainer.py:1429] 2022-05-18 17:56:02,224 >> Num examples = 3668 [INFO|trainer.py:1430] 2022-05-18...

the main deepspeed config difference is: ``` - "type": "WarmupDecayLR", + "type": "WarmupLR", ``` but it shouldn't cause an issue with the pre-trained weights. I wonder why you see a...

Great to hear you found the cause. In general when you use deepspeed ZeRO stage-3 and you see a shape that's of size 0, it's because the weights are sharded...

Please give me a full setup that I can reproduce your issue with and I will try to come up with a solution. And also if you write your own...

Thank you, @pacman100 Please try this PR https://github.com/huggingface/transformers/pull/17373

Are you sure there is a leak? How can I see it? You shared the script but not the output which you believe should indicate a leak. I modified your...

Unless of course you're referring the memory growth during the first try. Is that what you're referring to? And since your ds is small it's hard to see the growth...

I was able to reproduce the leak with: ``` import psutil import os import gc from datasets import load_from_disk import time DATASET_PATH = "/hf/m4-master/data/cm4/cm4-10000-v0.1" dataset = load_from_disk(DATASET_PATH) # truncate to...

This issue has nothing to do with `PIL`'s decoder. I removed it and the problem is still there. I then traced this leak to this single call: `pa_table.to_pydict()` here: https://github.com/huggingface/datasets/blob/08a7b389cdd6fb49264a72aa8ccfc49a233494b6/src/datasets/formatting/formatting.py#L138-L140...