Stas Bekman
Stas Bekman
update: If I run the same job as a local executor it works fine, it hangs on the first sample w/ slurm, so it must be some pickle related issue....
Yes, of course ``` import functools import torch from transformers import AutoConfig, AutoModel, AutoTokenizer class ClassifierFilter(BaseFilter): name = "Classifier Filter" def __init__( self, exclusion_writer: DiskWriter = None, ): super().__init__(exclusion_writer) @functools.cached_property...
Is there a plan for another way of passing the jobs instead of pickle? The hanging happens because of `functool.cached_property` - so can't use it it seems. I came up...
ok, so now we know setting `device_id` leads to hanging in `2.6.0
> Wondering if you have any tips & tricks for working with performance profiling tools such as `nsys`? I don't have experience with `nsys`. > Or recommendations for systematically optimizing...
Closing due to inactivity. Please feel free to re-open if needed.
@SunMarc, please kindly sync with @S1ro1 - we are waiting for him to complete the redesign of the parallelism in HF Accelerate. https://github.com/huggingface/accelerate/pull/3673
> you can probably just integrate it with DeepSpeed I'm not sure what you mean, Matej. It's already in Deepspeed. Unless you mean in the Deepspeed plugin of HF Accelerate?...
I'm not quite sure why you're tagging me here as I am not part of this project and I have no idea what code you're talking about. If it's a...