equidock_public
equidock_public copied to clipboard
about preprocess_raw_data.py
When I run the command as follows:
python preprocess_raw_data.py -n_jobs 60 -data dips -graph_nodes residues -graph_cutoff 30 -graph_max_neighbor 10 -graph_residue_loc_is_alphaC -pocket_cutoff 8 -data_fraction 1.0
it can generate six files in the directory /extendplus/jiashan/equidock_public/src/cache/dips_residues_maxneighbor_10_cutoff_30.0_pocketCut_8.0/cv_0
with files
label_test.pkl ligand_graph_test.bin receptor_graph_test.bin
label_val.pkl ligand_graph_val.bin receptor_graph_val.bin
However, three more files could not be generated successfully, and report errors as follows:
Processing ./cache/dips_residues_maxneighbor_10_cutoff_30.0_pocketCut_8.0/cv_0/label_frac_1.0_train.pkl
Num of pairs in train = 39901
Killed
Could you help me solve this problem? Thanks!
Generating the full DIPS training data takes a lot of time and you have to check if you have enough resources for it. Can you try generating just a fraction of it first, e.g., -data_fraction 0.1 ?
Thank you for your reply! I can successfully run the command by modifying parameters! Thank you very much for help!
Thank you for your reply! I can successfully run the command by modifying parameters! Thank you very much for help!
I run it with 160GB RAM for five hours, still failed get the same error. that's really nedd a huge resources. mark it hope usefull for others
marke it , i used 25 cpu 400GB RAm processed for 15 hours.
I had the same problem. The main reason for this is insufficient memory. The pre-processing of the training data of DIPS dataset did require a large amount of memory, which I could not complete this at one time with a server with 256G memory.
One way is to batch. /DIPS/data/DIPS/interim/pairs-pruned/pairs-postprocessed-train.txt stores all the PDB files waiting to be pre-processed. So, you can divide the txt file into several parts and preprocessing respectively. After this you just need to merge the generated files together. I divided the training data into two parts and finished the pre-processing successfully with 256G memory server.