qlib icon indicating copy to clipboard operation
qlib copied to clipboard

feat: jupyter notebook for explaning CSRankNorm

Open igor17400 opened this issue 2 years ago • 1 comments

Description

CSRankNorm is a processor that applies cross sectional rank normalization and its understanding is crucial to execute and create models in Qlib. However, its hard to understand how processors and CSRankNorm (one of the most used processors in all qlib) work in practice.

Motivation and Context

In order to give a better context and application of processors, this PR has a jupyter notebookt that focus on CSRankNorm and gives the entire background of how it operates and how it is used as the entry of ML algorithms in Qlib.

This PR was inspired by the issue #1024 in which me and other folks has had doubts about how label is calculated in the dataset passed as input to ML models. I believe that such issue has to do with the general lack of misunderstanding around the process of how processors works.

How Has This Been Tested?

  • [x] Pass the test by running: pytest qlib/tests/test_all_pipeline.py under upper directory of qlib.
  • [x] If you are adding a new feature, test on your own test scripts.

To test just execute the jupyter notebook under tutorial/csranknorm_explanation.ipynb.

Screenshots of Test Results (if appropriate):

  1. Pipeline test: Execute jupyter notebook
  2. Results: The jupyter notebook created has several explanations around CSRankNorm as an example, the following images are used as illustrations to better understand the role of processors. The first image is the raw data and the second image is the data after applied to CSRankNorm processor.
Screen Shot 2022-05-11 at 21 23 07 Screen Shot 2022-05-11 at 21 23 20

Types of changes

  • [ ] Fix bugs
  • [ ] Add new feature
  • [x] Update documentation
  • [x] Add tutorial/explanation

igor17400 avatar May 12 '22 00:05 igor17400

@igor17400 Could you please remove the CSV data and use Qlib's opensource data to demonstrate this feature? Besides, the .ipynb is too large, please remove the redundant output with this script https://stackoverflow.com/a/20844506 Thanks :)

you-n-g avatar Jun 06 '22 11:06 you-n-g

This PR is stale because it has been open for a year with no activity. Remove the stale label or comment on the PR otherwise this will be closed in 5 days

github-actions[bot] avatar Feb 08 '23 15:02 github-actions[bot]