qlib
                                
                                 qlib copied to clipboard
                                
                                    qlib copied to clipboard
                            
                            
                            
                        为什么数据缺失在中段以及首尾的处理方式不一样
你好,咨询一个场景。
https://github.com/microsoft/qlib/blob/687edd79d0ee75fbf61bf1c1198ac130ef8f5b5c/scripts/dump_bin.py#L199-L202
如上述代码,如果在calendars首尾缺失值,则没有任何数值的占位,仅仅是arr[0] 记录start_index; 但是,如果是calendars中间位置缺失值,则会填充NAN.
这样处理的原因是什么? 都用NAN填充有何影响?
例子1:
calendars: 2017-01-01 2017-01-02 2017-01-03 2017-01-04 2017-01-05
0000001这只股票的数据如下: 2017-01-01 5.6 2017-01-02 5.8 2017-01-05 5.9
生成的bin内容如下: [5.6, 5.8, nan, nan, 5.9]
例子2:
calendars: 2017-01-01 2017-01-02 2017-01-03 2017-01-04 2017-01-05
0000001这只股票的数据如下: 2017-01-02 5.6 2017-01-03 5.8 2017-01-04 5.9
生成的bin内容如下: [1, 5.6, 5.8, 5.9]
期望
例子2中生成的bin内容为[nan, 5.6, 5.8, 5.9, nan] 是否可以?
有人遇到类似情况吗?
I assume that the reason why feature values in the middle of the date range need to be nan-filled, is because the binary database requires the .bin file to be "compact" with regard to the calendar, so that the offset of the file can be easlily calculated.
However, you gaps at the beginning or the end of the .bin file does not break the "compactness" requirement.
This issue is stale because it has been open for three months with no activity. Remove the stale label or comment on the issue otherwise this will be closed in 5 days