qlib
qlib copied to clipboard
Ticker name "NA" makes the exists_qlib_data function report errors.
🐛 Bug Description
The ticker name "NA" in the "all.txt" under /instruments
makes the exists_qlib_data
function fail due to the string "NA" being wrongly converted to the float "nan" but not a string.
To Reproduce
Steps to reproduce the behavior:
- Save the attached all.txt under the
~/.qlib/qlib_data/us_data/instruments
. - Run the following code:
provider_uri = "~/.qlib/qlib_data/us_data_new" # target_dir
if not exists_qlib_data(provider_uri):
print(f"Qlib data is not found in {provider_uri}")
sys.path.append(str(scripts_dir))
from get_data import GetData
GetData().qlib_data(target_dir=provider_uri, region=REG_US)
Expected Behavior
The code should run without errors.
Screenshot
Environment
Note: User could run cd scripts && python collect_info.py all
under project directory to get system information
and paste them here directly.
- Qlib version: 0.93
- Python version: 3.8.10
- OS (
Windows
,Linux
,MacOS
): Windows - Commit number (optional, please provide it if you are using the dev version):
Additional Notes
- The bug is caused by the wrong usage of pandas.read_csv in the following line of
exists_qlib_data
underqlib\utils\__init__.py
. Refer to the page for more details.
miss_code = set(pd.read_csv(_instrument, sep="\t", header=None).loc[:, 0].apply(str.lower)) - set(code_names)
- The cause of the bug can be further verified by the following code:
temp = pd.read_csv("all.txt", sep="\t", header=None).loc[:, 0]
non_string_values = [i for i in temp if not isinstance(i, str)]
print(non_string_values)
[nan]
- The bug can be easily fixed by adding
keep_default_na=False
temp = pd.read_csv("all.txt", sep="\t", header=None, keep_default_na=False).loc[:, 0]
non_string_values = [i for i in temp if not isinstance(i, str)]
print(non_string_values)
[]
- I can help with the fix, just want to ask what tests I need to run to make sure whether the fix would cause any other issues.
Would you like to create a PR to fix this and be one of the contributors to qlib.
@SunsetWolf Sure. Then I will double-check whether my fix will cause any issues, if not, then I will create a PR to fix it. And I am happy to be a contributor to Qlib and try to help with other issues.
@OzzyXu Let me know about it. I'd like to help.