RePlay Add RL-based CQL recommender

Greetings!

With this PR I am adding support for an RL-based CQL recommender. Conservative Q-Learning (CQL) algorithm is a SAC-based data-driven deep reinforcement learning algorithm, which achieves state-of-the-art performance in offline RL problems. For discrete-space problems CQL uses Double-DQN as a base algorithm.

For this request a d3rlpy implementation of a CQL is used. It has adapted to work with PySpark dataframes to follow the RePlay computational style.

Besides an implementation itself, I supply PR with a notebook (and bare python script) to compare CQL with several baseline recommenders the same way it is done in ./experiments/02_models_comparison.ipynb. It is tested on the same ML1M dataset with an exact same seed. Results are the following:

CQL comparison results

NB: As we use d3rlpy implementation of CQL, new dependency is added to pyproject.toml. However, I have problems to update poetry.lock as I develop on Mac M1, which requires me to loosen up the dependencies to setup and run RePlay. So, I don't have an exact dev environment as it is specified in poetry.lock. Which is why I cannot easily commit the needed changes.

Aug 01 '22 09:08 pkuderov

could you also add tests for the new model?

Aug 09 '22 13:08 shashist

consider pulling recent updates, we have fixed ci/cd in https://github.com/sb-ai-lab/RePlay/commit/6a2eaa01c2c221e91e444b0533f7290cf406051f

Aug 24 '22 11:08 shashist

Here's the wrap up for the changes introduced in the last update:

[x] Implement and test save/load mechanics:
- I added d3rlpy CQL model initialization to the CQL init. As d3rlpy has a pretty inflexible save/load pipeline, I had to extract/re-implement several necessary stages like getting init params and their deserialization.
- [x] Test saving/loading in tests/models/test_save_load_models
[x] Reuse fixtures in tests/utils for custom CQL tests.
[x] Add CQL to tests in tests/models/test_all_models
- [x] Add to test_predict_cold_and_new_filter_out, test_predict_new_users, test_predict_cold_users if applicable. Yes, they are applicable. However, the current stage of the model implementation is not particularly fit for cold predictions. We didn't even test the performance. But it is expected to be done in future updates.
[x] Check test coverage. pytest+cov report 93% coverage for CQL.py
[x] Move assert_omp_single_thread to utils
[x] Investigate why fit time is so large. It's due to the number of epochs used for learning. CQL is still noticeably slower than classic recommenders, however GPU acceleration makes execution time acceptable. We have not optimized performance yet (e.g. the scaling of the network size relative to the dataset size). The MDP dataset preparation takes just fractions of the entire fit time.
[x] Switch _prepare_data to use spark
[x] Describe actor_encoder_factory, q_func_factory and the other similar init args.

Jan 31 '23 10:01 pkuderov

Moved an MDP dataset preparation (its implementation) outside CQL method implementation while keeping the building itself inside the _fit method. This way the thoroughly tested recommender's fit pipeline is kept intact. At the same time it provides control to the user on how an MDP dataset should be built.

Tests are updated correspondingly, code coverage is about threshold (94%). We are waiting for your feedback.

Mar 24 '23 19:03 pkuderov

Do we need @staticmethod for _get_model_hyperparams? Seems like it will only be used with self.model

Apr 05 '23 14:04 shashist

RePlay RePlay copied to clipboard

Add RL-based CQL recommender

RePlay
RePlay copied to clipboard