Stas Bekman comments

Results 664 comments of


                                            Stas Bekman

parallelize writing of layer checkpoint files across data parallel instances

As I'm not part of the Deepspeed team my vote won't count, but your benchmarks are super-impressive and I'd say definitely go for it. I will let @tjruwase to chime...

ArrowInvalid: Could not convert <PIL.Image.Image image mode=RGB when adding image to Dataset

I was just pointed here by @mariosasko, meanwhile I found a workaround using `encode_example` like so: ``` from datasets import load_from_disk, Dataset DATASET_PATH = "/hf/m4-master/data/cm4/cm4-10000-v0.1" ds1 = load_from_disk(DATASET_PATH) ds2 =...

ArrowInvalid: Could not convert <PIL.Image.Image image mode=RGB when adding image to Dataset

Hmm, interesting. If I create the dataset on the fly: ``` from datasets import load_from_disk, Dataset DATASET_PATH = "/hf/m4-master/data/cm4/cm4-10000-v0.1" ds1 = load_from_disk(DATASET_PATH) ds2 = Dataset.from_dict(mapping={k: [v]*2 for k, v in...

Support subprocesses with the `dump` command

that would be very useful, thank you, @benfred! and additionally controlling how many generations of ancestry to descend might be useful as well. e.g. currently I have a need to...

feature: on demand `sudo`-like behavior w/o `sudo` for hanging processes

btw, this whole `sudo`-requirement seems to be a 5.x linux kernel thing. On one HPC I had no problem attaching w/o `sudo`, but discovered it was 4.x kernel!

feature: on demand `sudo`-like behavior w/o `sudo` for hanging processes

Thank you for the ptrace setting insight, @Jongy! Now I understand why the `sudo` was needed and that it had nothing to do with the kernel version! The problem with...

[BUG] DeBERTa has bad performance when using ZERO Stage-3 with continuous warnings "A module has unknown inputs or outputs type"

You redacted the type from the warning - was it `nn.Parameter`? If so it has been fixed here: https://github.com/microsoft/DeepSpeed/pull/2642 The fix will work for any `tensor.Torch` subclass. If it's another...

[BUG] DeBERTa has bad performance when using ZERO Stage-3 with continuous warnings "A module has unknown inputs or outputs type"

Fantastic. I'm glad you presented a concrete case - actually may be a new Issue would be better since mine is abstract and mentions several unrelated issues in one so...

[BUG] DeBERTa has bad performance when using ZERO Stage-3 with continuous warnings "A module has unknown inputs or outputs type"

It's very helpful to see the code - thank you, @yakazimir This definitely has nothing to do with PL and it's a pure DS end user model issue. OK, so...

[BUG] DeBERTa has bad performance when using ZERO Stage-3 with continuous warnings "A module has unknown inputs or outputs type"

Your code snippet is perfect, @yakazimir. I just can't see that custom object, but I assume that its tensors didn't have `requires_grad=False`. If they did then there is no problem....