openfold
openfold copied to clipboard
Does Training support Chunking?
Hello, I look forward to observing the dynamics of the network during training. I noticed that Chunking is turned off during training, and there is related assert in code, so is it not supported to start Chunking during training?
Chunking is disabled during training because it doesn't save any memory when grad is enabled---activations for each chunk are still stored for the backwards pass. Furthermore, activation checkpoints can't be inserted between chunk computations because torch/DeepSpeed checkpoints don't like it much when the same model parameters are used in two different checkpoints.
Is it true that during training, no matter whether the input size changes or not, it always goes through the same calculation path?thanks in advance
During training, input proteins are randomly cropped to a certain fixed length. Proteins shorter than the crop size are padded.
So can I assume that during training, all training data have the same computational logic?
There are a number of stochastic things that happen---e.g. the number of recycling iterations varies randomly. But within each recycling iteration, I believe that's the case, yes.
I am wondering if your code support batch training? backprop and update the model based on the gradients on multiple proteins?
We do support batch training. See the batch_size option in the config. The peak memory usage of the model is high enough that it's often impractical to increase it past 1, but for different training data or smaller crops than those used during model finetuning it might make sense.