Error about mutation_zeroshot.py
When I run python command like this : python mutation_zeroshot.py -c SaProt/config/ClinVar/saprot.yaml , the error happens. The information is listed below:
data/Users/zengbing1/software/anaconda3/envs/esm/lib/python3.11/site-packages/torchmetrics/utilities/prints.py:43: UserWarning: Metric SpearmanCorrcoef will save all targets and predictions in the buffer. For large datasets, this may lead to large memory footprint.
warnings.warn(*args, **kwargs) # noqa: B028
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
0%| | 0/2525 [00:00<?, ?it/s]NP_891847.1.csv
0%| | 0/2525 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/data3/soft/User/zengbing/method_data3/PLM/SaProt/scripts/mutation_zeroshot.py", line 71, in test_epoch_end has been removed in v2.0.0. SaprotFoldseekMutationModel implements this method. You can use the on_test_epoch_end hook instead. To access outputs, save them in-memory as instance attributes. You can find migration examples in https://github.com/Lightning-AI/lightning/pull/16520. Can you help me? Many thanks.
Hi!
This error was probably caused due to the incompatibility of the version of pytorch-lightning. You could solve the problem through this command:
pip install pytorch-lightning==1.8.3
Thank you. Could you share the ClinVar data format used for the code "python mutation_zeroshot.py -c SaProt/config/ClinVar/saprot.yaml". I have downloaded the clinvar data from the proteingym website, but error happened :return int(self._get("length")) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'. The format of clinvar data I used is like this: ,protein,protein_sequence,mutant,mutated_sequence,DMS_bin_score 85343,NP_998885.1,MPRGSRSAASRPASRPAAPSAHPPAHPPPSAAAPAPAPSGQPGLMAQMATTAAGVAVGSAVGHVMGSALTGAFSGGSSEPSQPAVQQAPTPAAPQPLQMGPCAYEIRQFLDCSTTQSDLSLCEGFSEALKQCKYYHGLSSLP,Y135H,MPRGSRSAASRPASRPAAPSAHPPAHPPPSAAAPAPAPSGQPGLMAQMATTAAGVAVGSAVGHVMGSALTGAFSGGSSEPSQPAVQQAPTPAAPQPLQMGPCAYEIRQFLDCSTTQSDLSLCEGFSEALKQCKYHHGLSSLP,Benign
OK, thanks. I noticed the data provided is in LMDB format. Could you please share the details of the original source files and their formats that were used to generate it?
The original files are .pdb files. We just extracted such information and corresponding labels to generate a unified lmdb for data loading.