Gustaf Ahdritz

Results 170 comments of Gustaf Ahdritz

Actually on second thought it's not very weird that really long validation proteins should fail---chunking isn't enabled by default during validation, so you'll get much worse memory performance than during...

Did you actually mean v100s, or was that a typo? v100s don't have bfloat16 support.

Hm I haven't seen this one before. [This thread](https://github.com/LuxCoreRender/LuxCore/issues/490) indicates that it might have something to do with your CUDA version?

Could you verify the git commit hash of your installation + that the git diff shows no modifications to the model code?

Could you try disabling the custom CUDA kernels on the bad system? Specifically, make sure that all occurrences of `use_memory_efficient_kernel` and `use_flash` are set to `False` in the config.

Ah my bad I never added it to the config. You'll have to disable `use_memory_efficient_kernel` manually in `openfold/model/evoformer.py`. There should only be one occurrence of it there; change the setting...

Weird. I'm not really sure what to make of this. Do you think it's worth trying to install OF from scratch on the 1080 system as a sanity check?

Thanks for the diagnostics. Weird that it's crashing on a stock PyTorch matmul and not the custom kernel immediately thereafter... Equally weird is that this is happening halfway through the...

This all happens automatically. The inference-time "chunking" you described is controlled by the "chunk_size" parameter of the config. Activation checkpointing is controlled by "blocks_per_ckpt".

Chunking is disabled during training because it doesn't save any memory when grad is enabled---activations for each chunk are still stored for the backwards pass. Furthermore, activation checkpoints can't be...