RePlay
RePlay copied to clipboard
Add RL-based CQL recommender
Greetings!
With this PR I am adding support for an RL-based CQL recommender. Conservative Q-Learning (CQL) algorithm is a SAC-based data-driven deep reinforcement learning algorithm, which achieves state-of-the-art performance in offline RL problems. For discrete-space problems CQL uses Double-DQN as a base algorithm.
For this request a d3rlpy implementation of a CQL is used. It has adapted to work with PySpark dataframes to follow the RePlay computational style.
Besides an implementation itself, I supply PR with a notebook (and bare python script) to compare CQL with several baseline recommenders the same way it is done in ./experiments/02_models_comparison.ipynb. It is tested on the same ML1M dataset with an exact same seed. Results are the following:

NB: As we use d3rlpy implementation of CQL, new dependency is added to pyproject.toml. However, I have problems to update poetry.lock as I develop on Mac M1, which requires me to loosen up the dependencies to setup and run RePlay. So, I don't have an exact dev environment as it is specified in poetry.lock. Which is why I cannot easily commit the needed changes.
could you also add tests for the new model?
consider pulling recent updates, we have fixed ci/cd in https://github.com/sb-ai-lab/RePlay/commit/6a2eaa01c2c221e91e444b0533f7290cf406051f
Here's the wrap up for the changes introduced in the last update:
-
[x] Implement and test save/load mechanics:
- I added d3rlpy CQL model initialization to the CQL init. As d3rlpy has a pretty inflexible save/load pipeline, I had to extract/re-implement several necessary stages like getting init params and their deserialization.
- [x] Test saving/loading in
tests/models/test_save_load_models
-
[x] Reuse fixtures in
tests/utilsfor custom CQL tests. -
[x] Add CQL to tests in
tests/models/test_all_models- [x] Add to
test_predict_cold_and_new_filter_out,test_predict_new_users,test_predict_cold_usersif applicable. Yes, they are applicable. However, the current stage of the model implementation is not particularly fit for cold predictions. We didn't even test the performance. But it is expected to be done in future updates.
- [x] Add to
-
[x] Check test coverage. pytest+cov report 93% coverage for CQL.py
-
[x] Move assert_omp_single_thread to utils
-
[x] Investigate why
fittime is so large. It's due to the number of epochs used for learning. CQL is still noticeably slower than classic recommenders, however GPU acceleration makes execution time acceptable. We have not optimized performance yet (e.g. the scaling of the network size relative to the dataset size). The MDP dataset preparation takes just fractions of the entire fit time. -
[x] Switch
_prepare_datato use spark -
[x] Describe
actor_encoder_factory,q_func_factoryand the other similar init args.
Moved an MDP dataset preparation (its implementation) outside CQL method implementation while keeping the building itself inside the _fit method. This way the thoroughly tested recommender's fit pipeline is kept intact. At the same time it provides control to the user on how an MDP dataset should be built.
Tests are updated correspondingly, code coverage is about threshold (94%). We are waiting for your feedback.
Do we need @staticmethod for _get_model_hyperparams? Seems like it will only be used with self.model