qlib icon indicating copy to clipboard operation
qlib copied to clipboard

[Proposal] Systematic RL support in qlib

Open ultmaster opened this issue 3 years ago • 1 comments

2022/2/25

Package name:

  • qlib.neutrader?
    • Sound, brand
    • Sounds like limited to "trading" scenario
  • qlib.rl?
    • Shorter, easier to remember
    • Not exactly an RL framework
    • No ML as opposed to RL.
  • qlib.sdm?
    • What is sdm?

K: Keep in an internal repo (possibly another repo)

D: Delete

TBD: To be discussed (possibly need merge effort)

Major efforts

  • Unify config
    • Currently neutrader is based on utilsd.config.
    • qlib has another configuration system.
  • Unify entry
    • Currently neutrader has several command line tools
    • qlib is mostly Pythonic launching.
    • Merge with workflow, qrun.
  • Documentation and tests
    • Neutrader is almost zero-documented.
    • Neutrader has its own pytest.
    • … and coveragerc.
K .azure/docker.yml
D .azure/pipeline.yml
TBD .coveragerc
K .gitignore
K README.md
K docker/base.Dockerfile
K docker/neutrader.Dockerfile
qlib/examples examples/backtest/qlib.yml
qlib.contrib.data.utils examples/data/ordergen.py
neutrader/__init__.py
<neutrader>.action neutrader/action.py
neutrader/data/__init__.py
neutrader.data neutrader/data/base.py
neutrader.utils neutrader/data/data_queue.py
qlib.contrib.data neutrader/data/highfreq_handler.py
qlib.contrib.data neutrader/data/highfreq_handler_order.py
qlib.contrib.data neutrader/data/highfreq_handler_order_other_price.py
qlib.contrib.data neutrader/data/highfreq_label_handler.py
qlib.contrib.data neutrader/data/highfreq_label_handler_other_price.py
qlib.contrib.ops neutrader/data/highfreq_ops.py
qlib.contrib.data neutrader/data/highfreq_processor.py
qlib.contrib.data neutrader/data/highfreq_provider.py
neutrader.data neutrader/data/intraday.py
neutrader/env/__init__.py
neutrader.utils neutrader/env/finite_env.py
neutrader.env (deprecated) neutrader/env/intraday_sa.py
neutrader.utils neutrader/env/logging.py
neutrader/forecast/__init__.py
K neutrader/forecast/__main__.py
K neutrader/forecast/common/__init__.py
K neutrader/forecast/common/function.py
K neutrader/forecast/common/util.py
K neutrader/forecast/config.py
K neutrader/forecast/dataset/__init__.py
K neutrader/forecast/dataset/forecast.py
K neutrader/forecast/dataset/minlevel.py
K neutrader/forecast/model/__init__.py
K neutrader/forecast/model/base.py
K neutrader/forecast/model/darnn.py
neutrader/network/__init__.py
neutrader.network neutrader/network/base.py
neutrader.network neutrader/network/darnn.py
K neutrader/network/darnn4pred.py
neutrader.network neutrader/network/recurrent.py
neutrader.observation neutrader/observation.py
neutrader/policy/__init__.py
neutrader.policy neutrader/policy/base.py
K neutrader/policy/baseline.py
neutrader.policy neutrader/policy/twap/vwap/ac.py
K neutrader/policy/mappo.py
neutrader.policy neutrader/policy/ppo.py
neutrader.policy neutrader/policy/utils.py
neutrader/qlib_integration/__init__.py
neutrader.integration neutrader/qlib_integration/feature.py
neutrader.integration neutrader/qlib_integration/infrastructure.py
K neutrader/qlib_integration/predictor.py
neutrader.integration neutrader/qlib_integration/simulator.py
neutrader.integration neutrader/qlib_integration/strategy.py
neutrader.reward neutrader/reward.py
D neutrader/search/__init__.py
D neutrader/search/config_gen.py
D neutrader/search/rerun_exp.py
D neutrader/search/search.py
D neutrader/search/util.py
neutrader.state neutrader/state.py
neutrader.cli neutrader/tools/__init__.py
neutrader.cli neutrader/tools/backtest.py
neutrader.cli neutrader/tools/backtest_qlib.py
neutrader.cli neutrader/tools/config.py
neutrader.cli neutrader/tools/ctl.py
neutrader.cli neutrader/tools/openpai.py
neutrader.cli neutrader/tools/train_onpolicy.py
TBD setup.py
qlib/tests/rl tests/assets/opds_15_225_backtest_qlib.csv
qlib/tests tests/assets/opds_15_225_inner_twap_backtest_qlib.csv
qlib/tests tests/assets/opds_15_225_single_day_backtest_qlib.csv
qlib/tests tests/assets/peppo_15_225_backtest_qlib.csv
qlib/tests tests/assets/twap_backtest_qlib.csv
qlib/tests tests/assets/twap_nested_backtest_qlib.csv
qlib/tests tests/assets/twap_single_day_backtest_qlib.csv
qlib/tests tests/configs/hamburger.yml
qlib/tests tests/configs/opds_15_225_backtest_qlib.py
qlib/tests tests/configs/peppo_15_225_backtest_qlib.py
qlib/tests tests/configs/ppo_30min_test.yml
qlib/tests tests/configs/ppo_30min_test_qlib.yml
qlib/tests tests/configs/ppo_30min_train.yml
qlib/tests tests/configs/twap_30min.yml
qlib/tests tests/configs/twap_backtest_qlib.yml
qlib/tests tests/configs/twap_nested_backtest_qlib.yml
qlib/tests tests/test_dataloader.py
qlib/tests tests/test_e2e.py
qlib/tests tests/test_finite_env.py
qlib/tests tests/test_qlib_integration.py
qlib/tests tests/test_state.py

ultmaster avatar Mar 25 '22 06:03 ultmaster

Status update (5/27)


Immediate work items are those I believe important and marked italic.

RL framework - self-contained, agnostic to tasks

  • [ ] Trainer @ultmaster #1125 waiting for review
  • [x] Policy - interpreter - simulator
  • [ ] Logging system (only basics, many TODOs - more loggers including tensorboard, mlflow, memory buffer) - 2 weeks
  • [x] Auxiliary info collector
  • [x] Reward
  • [x] Seed (aka initial state)
  • [ ] Other utilities - detailed breakdowns from #1076
    • [x] Data queue, finite env
    • [x] Env wrapper (env = interpreter + simulator, policy = policy)
    • [ ] Non-linux compatibility fix
    • [ ] Performance optimization
    • [ ] Rechargeable queue (needed by PM)

Qlib integration - Make RL framework part of qlib

  • [x] Use qlib.backtest.Order throughout everywhere where "order" is needed.
  • [ ] Strategy wrapper (strategy = interpreter + policy, simulator = qlib.backtest + something else). @lihuoran - neutrader simulator migrated.
    • [ ] RL can use simulator provided by qlib backtest (can run)
    • [ ] Qlib inference can use trained policy (including simple policies like TWAP) in RL (only has internal drafts).
  • [ ] Experiment/workflow management (closely related to "trainer" above).
  • [ ] Programming with config only - launching backtest / training via config.

Tasks and algorithms - somewhat independent

  • [ ] SAOE
    • [x] The first SAOE simulator built upon "OPD-styled" data, along with several interpreters and basic policies.
    • [ ] Second SAOE simulator based on qlib.backtest. Depends on "Strategy wrapper" above.
    • [ ] New (and old) algorithms listed by Kan. @rk2900
    • [ ] OPD
      • [ ] Depends on: log actions of agents
    • [ ] DDQN, PPO, AC, VWAP: decision needed - whether to load data and predict online, or cache the prediction offline.
  • [ ] PM

Others

  • [ ] Tutorials for first-time users.
  • [ ] Continual improvements on tests.

matluster avatar May 27 '22 04:05 matluster