Gustaf Ahdritz comments

Results 170 comments of


Gustaf Ahdritz

Does Training support Chunking?

During training, input proteins are randomly cropped to a certain fixed length. Proteins shorter than the crop size are padded.

Does Training support Chunking?

There are a number of stochastic things that happen---e.g. the number of recycling iterations varies randomly. But within each recycling iteration, I believe that's the case, yes.

Does Training support Chunking?

We do support batch training. See the `batch_size` option in the config. The peak memory usage of the model is high enough that it's often impractical to increase it past...

Frequent loss is NaN & Training Hangs

What do you mean by "the version you committed after December"? Are you referring to a specific commit? BTW: I just spotted a mistake in the training_step workaround and fixed...

Frequent loss is NaN & Training Hangs

I'll look into this. That the loss hits NaN and then stays that way is fairly common, but I'm very surprised to hear you didn't encounter the same issue using...

Frequent loss is NaN & Training Hangs

Kind of seems like you might be bottlenecked by data processing. Maybe try increasing the number of DataLoader workers, or pinning GPU memory for the DataLoader workers?

Frequent loss is NaN & Training Hangs

Could you send the breakdown of the loss? Go to the definition of `AlphaFoldLoss` in `openfold/utils/loss` and print out the component parts of the cumulative loss.

Frequent loss is NaN & Training Hangs

Could you just post it here?

Frequent loss is NaN & Training Hangs

Sorry if I was unclear, but I meant printing out the values of each of the constituent losses in `openfold/utils/loss.py` (e.g. FAPE loss, distogram loss, etc.). I want to see...

Frequent loss is NaN & Training Hangs

1e-9 is just the default---it should be overridden by the config file when you enable `--precision 16`. I've been focusing on bfloat16 training and implementing Multimer for the past couple...