Jacob Danovitch comments

Results 32 comments of


                                            Jacob Danovitch

Running the same subflow concurrently multiple times raises `RuntimeError(

I'm having a similar problem when trying to run a flow that calls the same subflow multiple times, which itself calls tasks. The outer flow generates and loops over a...

Deepspeed integration

> I did not carefully examine the difference between the existing trainer and the deep speed one, but it looks like they are almost the same? Yes, they are very...

Deepspeed integration

See my comments in the issue thread for more detail. The slowdown seems to be related to gradient accumulation. The next steps are (1) seeing if the slowdown is reproducible...

Deepspeed integration

@dirkgr I think this is ready to take a look at. Some notes thus far: * Deepspeed is heavily config-based and it's hard to avoid, so rather than fighting it,...

Deepspeed integration

Thanks for looking it over! I'll start linting everything and getting the tests up and running (we can probably re-use the existing Trainer tests, yeah). As for the code duplication,...

Deepspeed integration

> These are special `nn.Module`s that work particularly well with DeepSpeed? More or less, as far as I understand they're heavily optimized CUDA kernels that help for things like long...

Deepspeed integration

Still working on deduplicating code (and linting). I was able to get a lot reduced almost the entire constructor) by lying to the `super().__init__()` and passing `distributed=False` so that it...

Deepspeed integration

Got all the typechecks out the way, phew. I've also managed to cut out a lot of duplicated code, I think! The remainder is almost entirely checkpointing related. For loading/saving,...

Deepspeed integration

> Is there a way to detect whether we are in a deepspeed context? If so, I'd be OK with some sort of `if not in_deepspeed:`. Otherwise, let's just duplicate...

Deepspeed integration

Sounds good. I think Deepspeed might set some environment variables itself, similarly to torch, so I'll poke around to see if we can use one of those. If not we...