zipline
zipline copied to clipboard
Warning of out of bounds for uint32 when use zipline ingest
Dear Zipline Maintainers,
Before I tell you about my issue, let me describe my environment:
Environment
- Operating System: osx10.11
- Python Version: python2.7
- Python Bitness: ..
- How did you install Zipline: pip
- Python packages: ..
Now that you know a little about me, let me tell you about the issue I am having:
Description of Issue
I have register a customer bundle in the extension file(~/.zipline/extension.py),
equities = {
'AAPL',
'MSFT',
'GOOG',
'^GSPC',
}
register(
'ap-ms-gg', # name this whatever you like
yahoo_equities(equities),
)
After this, when i use the command %zipline ingest -b ap-ms-gg to download data from yahoo, the warning shows:
Downloading Yahoo pricing data: [##################------------------] 50% 0d 00:00:04/Users/zhangzheng/pyenv/tensorflow/lib/python2.7/site-packages/zipline/data/us_equity_pricing.py:170: UserWarning: Ignoring 761 values because they are out of bounds for uint32: open high low close volume \
Date
2007-07-26 1518.089966 1518.089966 1465.300049 1482.660034 4472550000
2007-07-27 1482.439941 1488.530029 1458.949951 1458.949951 4784650000
2007-07-31 1473.900024 1488.300049 1454.250000 1455.270020 4524520000
2007-08-01 1455.180054 1468.380005 1439.589966 1465.810059 5256780000
2007-08-02 1465.459961 1476.430054 1460.579956 1472.199951 4368850000
2007-08-06 1433.040039 1467.670044 1427.390015 1467.670044 5067200000
2007-08-07 1467.619995 1488.300049 1455.800049 1476.709961 4909390000
2007-08-08 1476.219971 1503.890015 1476.219971 1497.489990 5499560000
2007-08-09 1497.209961 1497.209961 1453.089966 1453.089966 5889600000
2007-08-10 1453.089966 1462.020020 1429.739990 1453.640015 5345780000
2007-08-16 1406.640015 1415.969971 1370.599976 1411.270020 6509300000
2007-11-07 1515.459961 1515.459961 1475.040039 1475.619995 4353160000
2007-11-08 1475.270020 1482.500000 1450.310059 1474.770020 5439720000
2007-11-09 1467.589966 1474.089966 1448.510010 1453.699951 4587050000
2007-11-20 1434.510010 1452.640015 1419.280029 1439.699951 4875150000
2007-11-27 1409.589966 1429.489990 1407.430054 1428.229980 4320720000
2007-11-28 1432.949951 1471.619995 1432.949951 1469.020020 4508020000
2007-11-30 1471.829956 1488.939941 1470.890015 1481.140015 4422200000
2007-12-12 1487.579956 1511.959961 1468.229980 1486.589966 4482120000
2007-12-21 1463.189941 1485.400024 1463.189941 1484.459961 4508590000
2008-01-08 1415.709961 1430.280029 1388.300049 1390.189941 4705390000
2008-01-09 1390.250000 1409.189941 1378.699951 1409.130005 5351030000
2008-01-10 1406.780029 1429.089966 1395.310059 1420.329956 5170490000
2008-01-11 1419.910034 1419.910034 1394.829956 1401.020020 4495840000
2008-01-15 1411.880005 1411.880005 1380.599976 1380.949951 4601640000
2008-01-16 1377.410034 1391.989990 1364.270020 1373.199951 5440620000
2008-01-17 1374.790039 1377.719971 1330.670044 1333.250000 5303130000
2008-01-18 1333.900024 1350.280029 1312.510010 1325.189941 6004840000
2008-01-22 1312.939941 1322.089966 1274.290039 1310.500000 6544690000
2008-01-24 1340.130005 1355.150024 1334.310059 1352.069946 5735300000
... ... ... ... ... ...
2016-02-05 1913.069946 1913.069946 1872.650024 1880.050049 4929940000
2016-02-08 1873.250000 1873.250000 1828.459961 1853.439941 5636460000
2016-02-09 1848.459961 1868.250000 1834.939941 1852.209961 5183220000
2016-02-10 1857.099976 1881.599976 1850.319946 1851.859985 4471170000
2016-02-11 1847.000000 1847.000000 1810.099976 1829.079956 5500800000
2016-02-12 1833.400024 1864.780029 1833.400024 1864.780029 4696920000
2016-02-16 1871.439941 1895.770020 1871.439941 1895.579956 4570670000
2016-02-17 1898.800049 1930.680054 1898.800049 1926.819946 5011540000
2016-02-18 1927.569946 1930.000000 1915.089966 1917.829956 4436490000
2016-02-24 1917.560059 1932.079956 1891.000000 1929.800049 4317250000
2016-02-26 1954.949951 1962.959961 1945.780029 1948.050049 4348510000
2016-02-29 1947.130005 1958.270020 1931.810059 1932.229980 4588180000
2016-03-01 1937.089966 1978.349976 1937.089966 1978.349976 4819750000
2016-03-02 1976.599976 1986.510010 1968.800049 1986.449951 4666610000
2016-03-03 1985.599976 1993.689941 1977.369995 1993.400024 5081700000
2016-03-04 1994.010010 2009.130005 1986.770020 1999.989990 6049930000
2016-03-07 1996.109985 2006.119995 1989.380005 2001.760010 4968180000
2016-03-08 1996.880005 1996.880005 1977.430054 1979.260010 4641650000
2016-03-10 1990.969971 2005.079956 1969.250000 1989.569946 4376790000
2016-03-17 2026.900024 2046.239990 2022.160034 2040.589966 4530480000
2016-03-18 2041.160034 2052.360107 2041.160034 2049.580078 6503140000
2016-04-28 2090.929932 2099.300049 2071.620117 2075.810059 4309840000
2016-04-29 2071.820068 2073.850098 2052.280029 2065.300049 4704720000
2016-05-31 2100.129883 2103.479980 2088.659912 2096.949951 4514410000
2016-06-17 2078.199951 2078.199951 2062.840088 2071.219971 4952630000
2016-06-24 2103.810059 2103.810059 2032.569946 2037.410034 7597449600
2016-06-27 2031.449951 2031.449951 1991.680054 2000.540039 5431220000
2016-06-28 2006.670044 2036.089966 2006.670044 2036.089966 4385810000
2016-06-30 2073.169922 2098.939941 2070.000000 2098.860107 4622820000
2016-09-16 2146.479980 2146.479980 2131.199951 2139.159912 5014360000
Adj Close
Date
2007-07-26 1482.660034
2007-07-27 1458.949951
2007-07-31 1455.270020
2007-08-01 1465.810059
2007-08-02 1472.199951
2007-08-06 1467.670044
2007-08-07 1476.709961
2007-08-08 1497.489990
2007-08-09 1453.089966
2007-08-10 1453.640015
2007-08-16 1411.270020
2007-11-07 1475.619995
2007-11-08 1474.770020
2007-11-09 1453.699951
2007-11-20 1439.699951
2007-11-27 1428.229980
2007-11-28 1469.020020
2007-11-30 1481.140015
2007-12-12 1486.589966
2007-12-21 1484.459961
2008-01-08 1390.189941
2008-01-09 1409.130005
2008-01-10 1420.329956
2008-01-11 1401.020020
2008-01-15 1380.949951
2008-01-16 1373.199951
2008-01-17 1333.250000
2008-01-18 1325.189941
2008-01-22 1310.500000
2008-01-24 1352.069946
... ...
2016-02-05 1880.050049
2016-02-08 1853.439941
2016-02-09 1852.209961
2016-02-10 1851.859985
2016-02-11 1829.079956
2016-02-12 1864.780029
2016-02-16 1895.579956
2016-02-17 1926.819946
2016-02-18 1917.829956
2016-02-24 1929.800049
2016-02-26 1948.050049
2016-02-29 1932.229980
2016-03-01 1978.349976
2016-03-02 1986.449951
2016-03-03 1993.400024
2016-03-04 1999.989990
2016-03-07 2001.760010
2016-03-08 1979.260010
2016-03-10 1989.569946
2016-03-17 2040.589966
2016-03-18 2049.580078
2016-04-28 2075.810059
2016-04-29 2065.300049
2016-05-31 2096.949951
2016-06-17 2071.219971
2016-06-24 2037.410034
2016-06-27 2000.540039
2016-06-28 2036.089966
2016-06-30 2098.860107
2016-09-16 2139.159912
[761 rows x 6 columns]
winsorise_uint32(raw_data, invalid_data_behavior, 'volume', *OHLC)
Merging daily equity files: [---------#--------------------------] 3
Merging daily equity files: [####################################]
Downloading Yahoo adjustment data: [####################################] 100%
Sincerely, zheng
Have the same problem here with custom data bundle. A lot of data seem missing because of that weird uint32 limit:
zipline/data/us_equity_pricing.py:171: UserWarning: Ignoring 2366 values because they are
out of bounds for uint32:
open high low close volume
date
2007-05-29 0.14200 0.14490 0.13780 0.13950 4874778000
2007-05-30 0.13770 0.13900 0.13550 0.13850 4698312000
2007-06-15 0.14690 0.14750 0.14400 0.14450 4952897000
...
2017-02-15 0.06780 0.06885 0.06720 0.06777 9098170000
2017-02-16 0.06818 0.06818 0.06759 0.06799 4326700000
2017-02-17 0.06824 0.06824 0.06755 0.06793 11253050000
[2366 rows x 5 columns]
winsorise_uint32(raw_data, invalid_data_behavior, 'volume', *OHLC)
Help please?
Hey @zhangzheng88 thanks for opening up this issue. The size limit has to do with some stuff in BColz (which you can look through if you're curious, in this repo)
Currently we don't support volumes that are larger than the size of uint32 (you can see that number with numpy (np.iinfo(np.uint32)
), but we can keep this issue in the backlog and if we need to discuss changing the limit to something like uint64 then reference this issue again.
@FreddieV4 I have the same issue when I try to ingest data which on some dates is NaN
. How should I treat this? If I just dropna()
dates with NaN
I get a different error: inside daily_bar_writer
I get AssertionError: Got 276 rows for daily bars table with first day=2016-01-04, last day=2017-08-31, expected 430 rows.
@JoaoAparicio where are you ingesting data from? My default action to something like that is just fillna(0)
then ffill
and then see what happens/figure out if that's appropriate (which in this case might not actually be the best approach)
On a somewhat related note, assuming the missing 154 rows were previously NaN
, I'm not sure why you're getting so many NaN
values (missing some context here)
Seems the same uint32 limit is hit for zipline ingest -b quandl
- only for one date though:
Ignoring 1 values because they are out of bounds for uint32:
open high low close volume ex_dividend split_ratio
2011-04-11 1.79 1.84 1.55 1.7 6.674913e+09 0.0 1.0
winsorise_uint32(raw_data, invalid_data_behavior, 'volume', *OHLC)
I think this issue comes from here https://github.com/quantopian/zipline/blob/master/zipline/data/us_equity_pricing.py#L139
The default behavior seems to be "warn" but I think that does not make sense for Volume since the major use cases for it should all be okay with a "bigger than big" answer.
I think the expected behavior should be more like constrain
or coerce
with the value being set to 2³²-1
.
For instance, when used to evaluate slippage, max_int
is a fine response even if the real number would have been 10 or 20 times that since the effect will be to fill even the largest possible order.
In the case of a strategy using Volume as a signal source, the effect would similarly have to be independent of just how many more than 2.147 billion shares were traded that day.
Regardless of what the actual number is, the meaning would still be "an extraordinarily large number of shares".
@zhangzheng88 @adegtyarev @freddiev4 @JoaoAparicio @amarin15 I have mentioned a temporary solution in a post and save the missing data.
Basically, replace 'uint32' with 'uint64' in those two files: us_equity_pricing.py
and minute_bars.py
Has there been any other progress on this issue? I cannot find any us_equity_pricing.py file. Is it enough to replace all instances of uint32 by uint64 in minute_bar.py as @0xboz mentioned above?
More generally, is there a plan to support volumes which are larger than UINT32_MAX, or to just set larger volumes equal to UINT32_MAX when they are read in?
any updates? I am having this exact issue, please help
@.***邮箱联系我,谢谢!