Jacob Danovitch comments

Results 32 comments of


                                            Jacob Danovitch

Deepspeed integration

Took a holiday break from this while our cluster was down for maintenance for a bit. Turns out that checkpointing/barrier issue might be more complicated than I thought, but not...

Deepspeed integration

Ah I think I see the real issue here. It's not the logging itself hanging. 1. (All ranks) My trainer tells my checkpointer to save **if** it's the master process...

> Why isn't the checkpointing thing a problem outside of AllenNLP? This should be an issue with DeepSpeed all the time, right? Their typical training loop is something like ([source](https://www.deepspeed.ai/getting-started/#model-checkpointing)):...

Deepspeed integration

Yeah that should work perfectly, I'll give it a try.

Get edge attributes for random walk

> I think this would be way easier if we would just return the edge indices in `random_walk` :( I guess your approach works, but will be indeed inefficient. Yeah,...

Get edge attributes for random walk

> If you are interested, we can add support for this in [`pyg-lib`](https://pyg-lib.readthedocs.io/en/latest/modules/sampler.html#pyg_lib.sampler.random_walk), which should be straightforward to add. It also supports nightly builds so it should be ready to...

Jacob Danovitch

Deepspeed integration

Deepspeed integration

Deepspeed integration

Deepspeed integration

Get edge attributes for random walk

Get edge attributes for random walk

Get edge attributes for random walk

aks connectedk8s connect should allow connecting to existing arc kubernetes resources and use own key-pair

Reintroduce AzureMLCluster

Reintroduce AzureMLCluster