Ali Sabet comments

Results 28 comments of


                                            Ali Sabet

Use spawn pool on POSIX platforms

@itamarst no worries. Do you have a dockerfile I can use to reproduce?

Use spawn pool on POSIX platforms

@ShaheedHaque @itamarst yes! In particular, Celery multi-processing doesn't play well with CUDA. Know any fixes?

Use spawn pool on POSIX platforms

@adampl interesting, can you share a link so I can read up further on that? Do I disable CUDA on parent by running torch code only if os.getpid != os.getppid()?

Allow celery to spawn rather than fork

> We're deploying celery on both windows and linux nodes and our code turned out to be not fork-safe. On windows, multiprocessing in python only supports spawn, so it worked...

[BUG] RuntimeError: Tensors must be contiguous error while finetuning with deepspeed.

I'm experiencing the same issue.

Plots (still) disappear after training is finished.

Hey @nate-wandb, sure! Here's the [workspace](https://wandb.ai/asabet/huggingface?workspace=user-asabet), and [example run](https://wandb.ai/asabet/huggingface/runs/n4kn17fi?workspace=user-asabet). Logging to wandb is handled with HF [Trainer](https://docs.wandb.ai/guides/integrations/huggingface).