Gustaf Ahdritz

Results 170 comments of Gustaf Ahdritz

Are you using PyTorch Lightning's built-in timer? If so, it's a running average of total time elapsed between iterations, including time spent loading data. Usually for me, the time starts...

I'm sorry to hear that you're getting NaNs. Two thoughts: 1. The model should be written in such a way that it just skips over examples that yield NaN loss,...

Hm. Peculiar. The `num_workers` parameter is handed straight to a native PyTorch DataLoader, so I can't say off the top of my head why that might be happening. I'll look...

Sorry for the delay. Despite testing with a number of values of `batch_size` and `num_workers`, I am unable to reproduce the behavior you described. How are you changing the batch...

With enough workers, the speedup is about linear in the size of the batches. `scripts/download_all_data.sh` downloads the AlphaFold training set. The validation set is from CAMEO. What exactly happens when...

I've figured out what's happening. Like you said, at some point in the model activations become NaN. In general, for certain inputs, this seems unavoidable---it's a fundamental limitation of fp16...

I still haven't been able to reproduce the batch/worker issue.

I usually get around 500-700% utilization. I've done small overfitting experiments, and I bottomed out near zero loss.

That's issue #197. I'll be fixing it soon.

The chain_data_cache.json needs to be generated for the training set. Could you elaborate on chain_data_cache being too small?