ReinventCommunity
ReinventCommunity copied to clipboard
Reinvent 3.0: issues with Aurora model
Hi, I'm trying the ReinformentLearning notebook to test Reinvent 3.0, but when it tries to load the pickle file, it errors with:
Traceback (most recent call last):
File "/progs/all/opensource/reinvent/3.0/miniconda3/lib/python3.7/site-packages/reinvent_scoring/scoring/score_components/standard/predictive_property_component.py", line 31, in _load_model
activity_model = self._load_container(parameters)
File "/progs/all/opensource/reinvent/3.0/miniconda3/lib/python3.7/site-packages/reinvent_scoring/scoring/score_components/standard/predictive_property_component.py", line 38, in _load_container
scikit_model = pickle.load(f)
_pickle.UnpicklingError: invalid load key, '\x00'.
When just trying to load it manually myself, same error:
Python 3.7.7 (default, May 7 2020, 21:25:33)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> pickle.load(open('Aurora_model.pkl', 'rb'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
_pickle.UnpicklingError: invalid load key, '\x00'.
Since I'm using the same Python as the in the env file (3.7.7) and pickle is in the standard library, I don't know what may be the issue.
When I do an hexdump, it seems to me like the file is broken:
hexdump -x Aurora_model.pkl
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
1c45f60 0000 0000 0000 0000 0000 0000
1c45f6b
Maybe something wrong with cloning from Git? Or is the source pickle broken? (I can load augmented.prior and hexdump it)
Thanks!
Hi, Could it be that you have a newer version of scikit-learn? These models were built with 0.21 of scikit. The other option is that the files got broken. There are some notebooks that seem broken but I havent had the chance to look into these issues yet.
The file was corrupt. I have uploaded an archived version of the model file to the repo. Github seem to have made a strict limit on files above 25MB. You would need to extract it first. Could you please confirm that it works. Thanks.
@patronov thanks for checking, we agree that it was corrupt then. I'll try the new one and give you feedback.
Hi, I've unzipped the pkl file you've added recently (it can cause some confusion for new users, since the corrupt file is still there, but that's something solvable).
I face now this warning:
/progs/all/opensource/reinvent/3.0/miniconda3/lib/python3.7/site-packages/sklearn/base.py:306: UserWarning: Trying to unpickle estimator DecisionTreeRegressor from version 0.21.2 when using version 0.21.3. This might lead to breaking code or invalid results. Use at your own risk.
And then it errors with:
11:57:06: local_reinforcement_logger.log_message +30: INFO starting an RL run
Traceback (most recent call last):
File "/progs/all/opensource/reinvent/3.0/source/input.py", line 21, in <module>
manager.run()
File "/cluster/import/master/progs/all/opensource/reinvent/3.0/source/running_modes/manager.py", line 17, in run
runner.run()
File "/cluster/import/master/progs/all/opensource/reinvent/3.0/source/running_modes/reinforcement_learning/reinforcement_runner.py", line 48, in run
score_summary: FinalSummary = self._scoring_function.get_final_score_for_step(smiles, step)
File "/progs/all/opensource/reinvent/3.0/miniconda3/lib/python3.7/site-packages/reinvent_scoring/scoring/function/base_scoring_function.py", line 71, in get_final_score_for_step
in self.scoring_components]
File "/progs/all/opensource/reinvent/3.0/miniconda3/lib/python3.7/site-packages/reinvent_scoring/scoring/function/base_scoring_function.py", line 70, in <listcomp>
summaries = [_update_total_score(sc.calculate_score_for_step(molecules, step), query_size, valid_indices) for sc
File "/progs/all/opensource/reinvent/3.0/miniconda3/lib/python3.7/site-packages/reinvent_scoring/scoring/score_components/base_score_component.py", line 26, in calculate_score_for_step
return self.calculate_score(molecules)
File "/progs/all/opensource/reinvent/3.0/miniconda3/lib/python3.7/site-packages/reinvent_scoring/scoring/score_components/standard/predictive_property_component.py", line 20, in calculate_score
score, raw_score = self._predict_and_transform(molecules)
File "/progs/all/opensource/reinvent/3.0/miniconda3/lib/python3.7/site-packages/reinvent_scoring/scoring/score_components/standard/predictive_property_component.py", line 25, in _predict_and_transform
score = self.activity_model.predict(molecules, self.parameters.specific_parameters)
File "/progs/all/opensource/reinvent/3.0/miniconda3/lib/python3.7/site-packages/reinvent_scoring/scoring/predictive_model/scikit_model_container.py", line 26, in predict
return self.predict_from_mols(molecules, parameters)
File "/progs/all/opensource/reinvent/3.0/miniconda3/lib/python3.7/site-packages/reinvent_scoring/scoring/predictive_model/scikit_model_container.py", line 32, in predict_from_mols
activity = self.predict_from_fingerprints(fps)
File "/progs/all/opensource/reinvent/3.0/miniconda3/lib/python3.7/site-packages/reinvent_scoring/scoring/predictive_model/scikit_model_container.py", line 39, in predict_from_fingerprints
predictions = self._activity_model.predict_proba(fps)
AttributeError: 'RandomForestRegressor' object has no attribute 'predict_proba'
This is my scikit-learn: scikit-learn 0.21.3 py37hcdab131_0 conda-forge
And it agrees with the one asked for in the environment file: https://github.com/MolecularAI/Reinvent/blob/b36b9d206e76590c7d584683fc45de8a74ce6033/reinvent.yml#L174
Is this so sensitive to the scikit-learn versions? I've used a model created by a colleague to successful test Reinvent 3.0 (since I couldn't use the Aurora model), and I did not take much for the scikit-learn version used to build it (I'd have to check which one it was).
I checked that for both Reinvent 2.0 and 3.0, the scikit-learn was 0.21.3 (explaining why I could use a model that a colleague did for 2.0), so I guess that the newer uploaded model was done in 0.21.2, and that's what's causing the issue
Hi, Sry for the slow response rate here. Been swamped the last couple of weeks. The reason you got 'RandomForestRegressor' object has no attribute 'predict_proba' is because it is a regression model and it has been likely used as a classifier. The warnings are due to the fact we have older versions in the yml. We shall see to update these. Apparently we would need to also supply newer models in the tutorials. @GuoJeff would you be able to help here?
@j3mdamas in case you have accumulated too many questions feel free to book me for a TC.
@patronov I've sent you an e-mail.
About the issue, I use the example stated as a test for my deployment of Reinvent 3. Probably it's not the best approach and I should run the tests instead, but it's closer to a "real-case" scenario.
In any case, I was able to progress with another model, I am not stuck, but maybe someone who comes and tries Reinvent for the first time will be, I don't know.