ReinventCommunity icon indicating copy to clipboard operation
ReinventCommunity copied to clipboard

Reinvent 3.0: issues with Aurora model

Open j3mdamas opened this issue 3 years ago • 7 comments

Hi, I'm trying the ReinformentLearning notebook to test Reinvent 3.0, but when it tries to load the pickle file, it errors with:

Traceback (most recent call last):
  File "/progs/all/opensource/reinvent/3.0/miniconda3/lib/python3.7/site-packages/reinvent_scoring/scoring/score_components/standard/predictive_property_component.py", line 31, in _load_model
    activity_model = self._load_container(parameters)
  File "/progs/all/opensource/reinvent/3.0/miniconda3/lib/python3.7/site-packages/reinvent_scoring/scoring/score_components/standard/predictive_property_component.py", line 38, in _load_container
    scikit_model = pickle.load(f)
_pickle.UnpicklingError: invalid load key, '\x00'.

When just trying to load it manually myself, same error:

Python 3.7.7 (default, May  7 2020, 21:25:33)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> pickle.load(open('Aurora_model.pkl', 'rb'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
_pickle.UnpicklingError: invalid load key, '\x00'.

Since I'm using the same Python as the in the env file (3.7.7) and pickle is in the standard library, I don't know what may be the issue.

When I do an hexdump, it seems to me like the file is broken:

hexdump -x Aurora_model.pkl
0000000    0000    0000    0000    0000    0000    0000    0000    0000
*
1c45f60    0000    0000    0000    0000    0000    0000
1c45f6b

Maybe something wrong with cloning from Git? Or is the source pickle broken? (I can load augmented.prior and hexdump it)

Thanks!

j3mdamas avatar Sep 01 '21 20:09 j3mdamas

Hi, Could it be that you have a newer version of scikit-learn? These models were built with 0.21 of scikit. The other option is that the files got broken. There are some notebooks that seem broken but I havent had the chance to look into these issues yet.

patronov avatar Sep 08 '21 16:09 patronov

The file was corrupt. I have uploaded an archived version of the model file to the repo. Github seem to have made a strict limit on files above 25MB. You would need to extract it first. Could you please confirm that it works. Thanks.

patronov avatar Sep 08 '21 17:09 patronov

@patronov thanks for checking, we agree that it was corrupt then. I'll try the new one and give you feedback.

j3mdamas avatar Sep 14 '21 09:09 j3mdamas

Hi, I've unzipped the pkl file you've added recently (it can cause some confusion for new users, since the corrupt file is still there, but that's something solvable).

I face now this warning:

/progs/all/opensource/reinvent/3.0/miniconda3/lib/python3.7/site-packages/sklearn/base.py:306: UserWarning: Trying to unpickle estimator DecisionTreeRegressor from version 0.21.2 when using version 0.21.3. This might lead to breaking code or invalid results. Use at your own risk.

And then it errors with:

11:57:06: local_reinforcement_logger.log_message +30: INFO     starting an RL run
Traceback (most recent call last):
  File "/progs/all/opensource/reinvent/3.0/source/input.py", line 21, in <module>
    manager.run()
  File "/cluster/import/master/progs/all/opensource/reinvent/3.0/source/running_modes/manager.py", line 17, in run
    runner.run()
  File "/cluster/import/master/progs/all/opensource/reinvent/3.0/source/running_modes/reinforcement_learning/reinforcement_runner.py", line 48, in run
    score_summary: FinalSummary = self._scoring_function.get_final_score_for_step(smiles, step)
  File "/progs/all/opensource/reinvent/3.0/miniconda3/lib/python3.7/site-packages/reinvent_scoring/scoring/function/base_scoring_function.py", line 71, in get_final_score_for_step
    in self.scoring_components]
  File "/progs/all/opensource/reinvent/3.0/miniconda3/lib/python3.7/site-packages/reinvent_scoring/scoring/function/base_scoring_function.py", line 70, in <listcomp>
    summaries = [_update_total_score(sc.calculate_score_for_step(molecules, step), query_size, valid_indices) for sc
  File "/progs/all/opensource/reinvent/3.0/miniconda3/lib/python3.7/site-packages/reinvent_scoring/scoring/score_components/base_score_component.py", line 26, in calculate_score_for_step
    return self.calculate_score(molecules)
  File "/progs/all/opensource/reinvent/3.0/miniconda3/lib/python3.7/site-packages/reinvent_scoring/scoring/score_components/standard/predictive_property_component.py", line 20, in calculate_score
    score, raw_score = self._predict_and_transform(molecules)
  File "/progs/all/opensource/reinvent/3.0/miniconda3/lib/python3.7/site-packages/reinvent_scoring/scoring/score_components/standard/predictive_property_component.py", line 25, in _predict_and_transform
    score = self.activity_model.predict(molecules, self.parameters.specific_parameters)
  File "/progs/all/opensource/reinvent/3.0/miniconda3/lib/python3.7/site-packages/reinvent_scoring/scoring/predictive_model/scikit_model_container.py", line 26, in predict
    return self.predict_from_mols(molecules, parameters)
  File "/progs/all/opensource/reinvent/3.0/miniconda3/lib/python3.7/site-packages/reinvent_scoring/scoring/predictive_model/scikit_model_container.py", line 32, in predict_from_mols
    activity = self.predict_from_fingerprints(fps)
  File "/progs/all/opensource/reinvent/3.0/miniconda3/lib/python3.7/site-packages/reinvent_scoring/scoring/predictive_model/scikit_model_container.py", line 39, in predict_from_fingerprints
    predictions = self._activity_model.predict_proba(fps)
AttributeError: 'RandomForestRegressor' object has no attribute 'predict_proba'

This is my scikit-learn: scikit-learn 0.21.3 py37hcdab131_0 conda-forge And it agrees with the one asked for in the environment file: https://github.com/MolecularAI/Reinvent/blob/b36b9d206e76590c7d584683fc45de8a74ce6033/reinvent.yml#L174

Is this so sensitive to the scikit-learn versions? I've used a model created by a colleague to successful test Reinvent 3.0 (since I couldn't use the Aurora model), and I did not take much for the scikit-learn version used to build it (I'd have to check which one it was).

j3mdamas avatar Sep 14 '21 10:09 j3mdamas

I checked that for both Reinvent 2.0 and 3.0, the scikit-learn was 0.21.3 (explaining why I could use a model that a colleague did for 2.0), so I guess that the newer uploaded model was done in 0.21.2, and that's what's causing the issue

j3mdamas avatar Sep 17 '21 11:09 j3mdamas

Hi, Sry for the slow response rate here. Been swamped the last couple of weeks. The reason you got 'RandomForestRegressor' object has no attribute 'predict_proba' is because it is a regression model and it has been likely used as a classifier. The warnings are due to the fact we have older versions in the yml. We shall see to update these. Apparently we would need to also supply newer models in the tutorials. @GuoJeff would you be able to help here?

@j3mdamas in case you have accumulated too many questions feel free to book me for a TC.

patronov avatar Sep 23 '21 23:09 patronov

@patronov I've sent you an e-mail.

About the issue, I use the example stated as a test for my deployment of Reinvent 3. Probably it's not the best approach and I should run the tests instead, but it's closer to a "real-case" scenario.

In any case, I was able to progress with another model, I am not stuck, but maybe someone who comes and tries Reinvent for the first time will be, I don't know.

j3mdamas avatar Sep 24 '21 11:09 j3mdamas