[Proposal] Systematic RL support in qlib

Open ultmaster opened this issue 3 years ago • 1 comments

2022/2/25

Package name:

qlib.neutrader?
- Sound, brand
- Sounds like limited to "trading" scenario
qlib.rl?
- Shorter, easier to remember
- Not exactly an RL framework
- No ML as opposed to RL.
qlib.sdm?
- What is sdm?

K: Keep in an internal repo (possibly another repo)

D: Delete

TBD: To be discussed (possibly need merge effort)

Major efforts

Unify config
- Currently neutrader is based on utilsd.config.
- qlib has another configuration system.
Unify entry
- Currently neutrader has several command line tools
- qlib is mostly Pythonic launching.
- Merge with workflow, qrun.
Documentation and tests
- Neutrader is almost zero-documented.
- Neutrader has its own pytest.
- … and coveragerc.

K	.azure/docker.yml
D	.azure/pipeline.yml
TBD	.coveragerc
K	.gitignore
K	README.md
K	docker/base.Dockerfile
K	docker/neutrader.Dockerfile
qlib/examples	examples/backtest/qlib.yml
qlib.contrib.data.utils	examples/data/ordergen.py
	neutrader/__init__.py
<neutrader>.action	neutrader/action.py
	neutrader/data/__init__.py
neutrader.data	neutrader/data/base.py
neutrader.utils	neutrader/data/data_queue.py
qlib.contrib.data	neutrader/data/highfreq_handler.py
qlib.contrib.data	neutrader/data/highfreq_handler_order.py
qlib.contrib.data	neutrader/data/highfreq_handler_order_other_price.py
qlib.contrib.data	neutrader/data/highfreq_label_handler.py
qlib.contrib.data	neutrader/data/highfreq_label_handler_other_price.py
qlib.contrib.ops	neutrader/data/highfreq_ops.py
qlib.contrib.data	neutrader/data/highfreq_processor.py
qlib.contrib.data	neutrader/data/highfreq_provider.py
neutrader.data	neutrader/data/intraday.py
	neutrader/env/__init__.py
neutrader.utils	neutrader/env/finite_env.py
neutrader.env (deprecated)	neutrader/env/intraday_sa.py
neutrader.utils	neutrader/env/logging.py
	neutrader/forecast/__init__.py
K	neutrader/forecast/__main__.py
K	neutrader/forecast/common/__init__.py
K	neutrader/forecast/common/function.py
K	neutrader/forecast/common/util.py
K	neutrader/forecast/config.py
K	neutrader/forecast/dataset/__init__.py
K	neutrader/forecast/dataset/forecast.py
K	neutrader/forecast/dataset/minlevel.py
K	neutrader/forecast/model/__init__.py
K	neutrader/forecast/model/base.py
K	neutrader/forecast/model/darnn.py
	neutrader/network/__init__.py
neutrader.network	neutrader/network/base.py
neutrader.network	neutrader/network/darnn.py
K	neutrader/network/darnn4pred.py
neutrader.network	neutrader/network/recurrent.py
neutrader.observation	neutrader/observation.py
	neutrader/policy/__init__.py
neutrader.policy	neutrader/policy/base.py
K	neutrader/policy/baseline.py
neutrader.policy	neutrader/policy/twap/vwap/ac.py
K	neutrader/policy/mappo.py
neutrader.policy	neutrader/policy/ppo.py
neutrader.policy	neutrader/policy/utils.py
	neutrader/qlib_integration/__init__.py
neutrader.integration	neutrader/qlib_integration/feature.py
neutrader.integration	neutrader/qlib_integration/infrastructure.py
K	neutrader/qlib_integration/predictor.py
neutrader.integration	neutrader/qlib_integration/simulator.py
neutrader.integration	neutrader/qlib_integration/strategy.py
neutrader.reward	neutrader/reward.py
D	neutrader/search/__init__.py
D	neutrader/search/config_gen.py
D	neutrader/search/rerun_exp.py
D	neutrader/search/search.py
D	neutrader/search/util.py
neutrader.state	neutrader/state.py
neutrader.cli	neutrader/tools/__init__.py
neutrader.cli	neutrader/tools/backtest.py
neutrader.cli	neutrader/tools/backtest_qlib.py
neutrader.cli	neutrader/tools/config.py
neutrader.cli	neutrader/tools/ctl.py
neutrader.cli	neutrader/tools/openpai.py
neutrader.cli	neutrader/tools/train_onpolicy.py
TBD	setup.py
qlib/tests/rl	tests/assets/opds_15_225_backtest_qlib.csv
qlib/tests	tests/assets/opds_15_225_inner_twap_backtest_qlib.csv
qlib/tests	tests/assets/opds_15_225_single_day_backtest_qlib.csv
qlib/tests	tests/assets/peppo_15_225_backtest_qlib.csv
qlib/tests	tests/assets/twap_backtest_qlib.csv
qlib/tests	tests/assets/twap_nested_backtest_qlib.csv
qlib/tests	tests/assets/twap_single_day_backtest_qlib.csv
qlib/tests	tests/configs/hamburger.yml
qlib/tests	tests/configs/opds_15_225_backtest_qlib.py
qlib/tests	tests/configs/peppo_15_225_backtest_qlib.py
qlib/tests	tests/configs/ppo_30min_test.yml
qlib/tests	tests/configs/ppo_30min_test_qlib.yml
qlib/tests	tests/configs/ppo_30min_train.yml
qlib/tests	tests/configs/twap_30min.yml
qlib/tests	tests/configs/twap_backtest_qlib.yml
qlib/tests	tests/configs/twap_nested_backtest_qlib.yml
qlib/tests	tests/test_dataloader.py
qlib/tests	tests/test_e2e.py
qlib/tests	tests/test_finite_env.py
qlib/tests	tests/test_qlib_integration.py
qlib/tests	tests/test_state.py

Mar 25 '22 06:03 ultmaster

Status update (5/27)

Immediate work items are those I believe important and marked italic.

RL framework - self-contained, agnostic to tasks

[ ] Trainer @ultmaster #1125 waiting for review
[x] Policy - interpreter - simulator
[ ] Logging system (only basics, many TODOs - more loggers including tensorboard, mlflow, memory buffer) - 2 weeks
[x] Auxiliary info collector
[x] Reward
[x] Seed (aka initial state)
[ ] Other utilities - detailed breakdowns from #1076
- [x] Data queue, finite env
- [x] Env wrapper (env = interpreter + simulator, policy = policy)
- [ ] Non-linux compatibility fix
- [ ] Performance optimization
- [ ] Rechargeable queue (needed by PM)

Qlib integration - Make RL framework part of qlib

[x] Use qlib.backtest.Order throughout everywhere where "order" is needed.
[ ] Strategy wrapper (strategy = interpreter + policy, simulator = qlib.backtest + something else). @lihuoran - neutrader simulator migrated.
- [ ] RL can use simulator provided by qlib backtest (can run)
- [ ] Qlib inference can use trained policy (including simple policies like TWAP) in RL (only has internal drafts).
[ ] Experiment/workflow management (closely related to "trainer" above).
[ ] Programming with config only - launching backtest / training via config.

Tasks and algorithms - somewhat independent

[ ] SAOE
- [x] The first SAOE simulator built upon "OPD-styled" data, along with several interpreters and basic policies.
- [ ] Second SAOE simulator based on qlib.backtest. Depends on "Strategy wrapper" above.
- [ ] New (and old) algorithms listed by Kan. @rk2900
- [ ] OPD
  - [ ] Depends on: log actions of agents
- [ ] DDQN, PPO, AC, VWAP: decision needed - whether to load data and predict online, or cache the prediction offline.
[ ] PM

Others

[ ] Tutorials for first-time users.
[ ] Continual improvements on tests.

May 27 '22 04:05 matluster