qlib icon indicating copy to clipboard operation
qlib copied to clipboard

Ticker name "NA" makes the exists_qlib_data function report errors.

Open OzzyXu opened this issue 1 year ago • 3 comments

🐛 Bug Description

The ticker name "NA" in the "all.txt" under /instruments makes the exists_qlib_data function fail due to the string "NA" being wrongly converted to the float "nan" but not a string.

To Reproduce

Steps to reproduce the behavior:

  1. Save the attached all.txt under the ~/.qlib/qlib_data/us_data/instruments.
  2. Run the following code:
provider_uri = "~/.qlib/qlib_data/us_data_new"  # target_dir
if not exists_qlib_data(provider_uri):
    print(f"Qlib data is not found in {provider_uri}")
    sys.path.append(str(scripts_dir))
    from get_data import GetData

    GetData().qlib_data(target_dir=provider_uri, region=REG_US)

Expected Behavior

The code should run without errors.

Screenshot

image

Environment

Note: User could run cd scripts && python collect_info.py all under project directory to get system information and paste them here directly.

  • Qlib version: 0.93
  • Python version: 3.8.10
  • OS (Windows, Linux, MacOS): Windows
  • Commit number (optional, please provide it if you are using the dev version):

Additional Notes

  1. The bug is caused by the wrong usage of pandas.read_csv in the following line of exists_qlib_data under qlib\utils\__init__.py. Refer to the page for more details.
 miss_code = set(pd.read_csv(_instrument, sep="\t", header=None).loc[:, 0].apply(str.lower)) - set(code_names)
  1. The cause of the bug can be further verified by the following code:
temp = pd.read_csv("all.txt", sep="\t", header=None).loc[:, 0]
non_string_values = [i for i in temp if not isinstance(i, str)]
print(non_string_values)
[nan]
  1. The bug can be easily fixed by adding keep_default_na=False
temp = pd.read_csv("all.txt", sep="\t", header=None, keep_default_na=False).loc[:, 0]
non_string_values = [i for i in temp if not isinstance(i, str)]
print(non_string_values)
[]
  1. I can help with the fix, just want to ask what tests I need to run to make sure whether the fix would cause any other issues.

OzzyXu avatar Jan 01 '24 06:01 OzzyXu

Would you like to create a PR to fix this and be one of the contributors to qlib.

SunsetWolf avatar Jan 16 '24 04:01 SunsetWolf

@SunsetWolf Sure. Then I will double-check whether my fix will cause any issues, if not, then I will create a PR to fix it. And I am happy to be a contributor to Qlib and try to help with other issues.

OzzyXu avatar Jan 16 '24 20:01 OzzyXu

@OzzyXu Let me know about it. I'd like to help.

SarthakNikhal avatar Jan 19 '24 19:01 SarthakNikhal