shenggan
shenggan
This is most likely because we only specified `compute_70,code=sm_70` and `compute_80,code=sm_80` when we compiled the cuda module. Could you please provide the hardware information of your machine (gpu).
Did you run `python setup.py install` in FastFold folder. Or you can attach the log of the installation.
FastFold need compile cuda extension for high performance kernel. From the log it appears that your torch requires cuda 11.5 and you need a matching version of the cuda environment....
I suppose it can be solved by downgrading the GCC from 9.4.0 to 8.x.
According to our experimental results, the effect of DAP is significant. I hope you can provide more details that we can reproduce, or more specific experimental setup and final experimental...
I think this result is reasonable. Although DAP slices most of the activation, the practical situation is that the linear reduction of the theory cannot be obtained because the model...
DAP and checkpoint techniques are orthogonal and can be used together. Further memory reduction can also be obtained by using DAP on top of checkpoint.
What is the length of the amino acid sequence used in your test? As we mentioned in our paper, we recommend using DAP for distributed inference only when the length...
Possible reasons why DAP does not work well on short sequences: 1) DAP can only reduce the memory needed for intermediate activation, and when the sequence length is not long...
You can refer, for example, to this place: https://github.com/hpcaitech/FastFold/blob/main/fastfold/distributed/comm.py#L56-L58 Scalability is presented as the reason in the paper, as it is the more fundamental reason for not using DAP.