Pocket2Mol icon indicating copy to clipboard operation
Pocket2Mol copied to clipboard

Skipping in training

Open lajictw opened this issue 1 year ago • 4 comments

Hi! After unzipping the dataset, I ran the train.py file directly, but it seems that all the data is skipped. I'm not sure if I'm overlooking something. Thank you for your help! A screenshot of my data folder is attached below. image

lajictw avatar Nov 08 '23 19:11 lajictw

Hi! In the event of an Exception occurring during data processing, the affected data will be skipped. If all the data is skipped, it's possible that your environment might not be correctly configured. To resolve this, you can review and verify your environment setup, delete the 'crossdocked_pocket10.lmdb' file, and then proceed to rerun 'train.py'. If the issue persists, you can catch the Exception during data processing to see what goes wrong.

pengxingang avatar Nov 10 '23 07:11 pengxingang

Hi! I encountered the same problem when executing sample.py on the testing data. The issue that I found was that one of the utils, protein_ligand.py, used deprecated data types, such as numpy.long of the NumPy module. This resulted in the program skipping most of the data as protien_ligand.py throws an exception because of the deprecated data types when it tries to process the data. Replacing numpy.long with numpy.longlong, numpy.bool with numpy.bool_, numpy.int with numpy.int_ solved this problem for me. I assume using an older version of NumPy will work as well. Check the release notes of NumPy for more.

I hope this helps.

Loer9999 avatar Nov 15 '23 02:11 Loer9999

Hi! I encountered the same problem when executing sample.py on the testing data. The issue that I found was that one of the utils, protein_ligand.py, used deprecated data types, such as numpy.long of the NumPy module. This resulted in the program skipping most of the data as protien_ligand.py throws an exception because of the deprecated data types when it tries to process the data. Replacing numpy.long with numpy.longlong, numpy.bool with numpy.bool_, numpy.int with numpy.int_ solved this problem for me. I assume using an older version of NumPy will work as well. Check the release notes of NumPy for more.

I hope this helps.

Thanks for your reply! I tried to replace them but the skipping still occurs. I will try to downgrade the numpy. Anyway, thanks again for your reply

lajictw avatar Nov 15 '23 07:11 lajictw

Hi! I encountered the same problem when executing sample.py on the testing data. The issue that I found was that one of the utils, protein_ligand.py, used deprecated data types, such as numpy.long of the NumPy module. This resulted in the program skipping most of the data as protien_ligand.py throws an exception because of the deprecated data types when it tries to process the data. Replacing numpy.long with numpy.longlong, numpy.bool with numpy.bool_, numpy.int with numpy.int_ solved this problem for me. I assume using an older version of NumPy will work as well. Check the release notes of NumPy for more. I hope this helps.

Thanks for your reply! I tried to replace them but the skipping still occurs. I will try to downgrade the numpy. Anyway, thanks again for your reply

Thank you for your valuable suggestions. I've addressed the numpy data type warning and made adjustments to the data processing code to handle Exceptions when skipping data. Feel free to pull the updated code to investigate why all the data is being skipped.

pengxingang avatar Nov 16 '23 01:11 pengxingang