Ellen Zhong comments

Results 52 comments of


                                            Ellen Zhong

GPU parallelization

Hi Heejong, Thanks for asking! The current top of tree has GPU parallelization (commit 3ba2439db6fef20922dd3c60c2a7ab1508475d76) and mixed precision training. Feel free to give it a shot -- I've been meaning...

GPU parallelization

Great to hear! I added some assertion messages for the assert that you ran into (commit f1de270a565592adc88602dfee313ed861afebb5). It's checking that your image size is a multiple of 8. Mixed precision...

GPU parallelization

Wow 17x! Great! I haven't noticed any accuracy degradation when using mixed precision training (admittedly with limited benchmarking), so I usually leave it on by default as well. For smaller...

torch.cuda.amp vs apex.amp

Use of `apex.amp` is a historic relic from the days before pytorch natively supported amp (version 1.6+ iirc). I kept it in after adding `torch.cuda.amp` support to maintain backwards compatibility....

Specify individual GPU for `train_vae`?

Yes, the cleanest way is to set the environment variable `CUDA_VISIBLE_DEVICES`: ``` CUDA_VISIBLE_DEVICES=0 cryodrgn train_vae ... ``` I believe it is the recommended way to select the desired GPU for...

Wrong pixel size in landscape clustering output

Thank you for reporting! @vineetbansal can you take a look?

Chunked data loading for large datasets

Thanks for the heads up. I can prioritize this feature.

Chunked data loading for large datasets

Just as an additional data point -- for a 1.4M particle dataset (D=128) I'm trying out, the training time goes from 43min -> 5:50hr per epoch if I load the...

Chunked data loading for large datasets

I added a new script `cryodrgn preprocess` which preprocesses images before training and significantly reduces the memory requirement of `cryodrgn train_vae`. This is now available in the top of tree...

Chunked data loading for large datasets

@vineetbansal, we should think about how to implement chunked data loading instead of the current options of either 1) loading the whole dataset into memory or 2) accessing each image...