zipline icon indicating copy to clipboard operation
zipline copied to clipboard

Warning of out of bounds for uint32 when use zipline ingest

Open weiguang-zz opened this issue 8 years ago • 10 comments

Dear Zipline Maintainers,

Before I tell you about my issue, let me describe my environment:

Environment

  • Operating System: osx10.11
  • Python Version: python2.7
  • Python Bitness: ..
  • How did you install Zipline: pip
  • Python packages: ..

Now that you know a little about me, let me tell you about the issue I am having:

Description of Issue

I have register a customer bundle in the extension file(~/.zipline/extension.py),

equities = {
    'AAPL',
    'MSFT',
    'GOOG',
    '^GSPC',
}
register(
    'ap-ms-gg',  # name this whatever you like
    yahoo_equities(equities),
)

After this, when i use the command %zipline ingest -b ap-ms-gg to download data from yahoo, the warning shows:

Downloading Yahoo pricing data:   [##################------------------]   50%  0d 00:00:04/Users/zhangzheng/pyenv/tensorflow/lib/python2.7/site-packages/zipline/data/us_equity_pricing.py:170: UserWarning: Ignoring 761 values because they are out of bounds for uint32:                    open         high          low        close      volume  \
Date                                                                         
2007-07-26  1518.089966  1518.089966  1465.300049  1482.660034  4472550000   
2007-07-27  1482.439941  1488.530029  1458.949951  1458.949951  4784650000   
2007-07-31  1473.900024  1488.300049  1454.250000  1455.270020  4524520000   
2007-08-01  1455.180054  1468.380005  1439.589966  1465.810059  5256780000   
2007-08-02  1465.459961  1476.430054  1460.579956  1472.199951  4368850000   
2007-08-06  1433.040039  1467.670044  1427.390015  1467.670044  5067200000   
2007-08-07  1467.619995  1488.300049  1455.800049  1476.709961  4909390000   
2007-08-08  1476.219971  1503.890015  1476.219971  1497.489990  5499560000   
2007-08-09  1497.209961  1497.209961  1453.089966  1453.089966  5889600000   
2007-08-10  1453.089966  1462.020020  1429.739990  1453.640015  5345780000   
2007-08-16  1406.640015  1415.969971  1370.599976  1411.270020  6509300000   
2007-11-07  1515.459961  1515.459961  1475.040039  1475.619995  4353160000   
2007-11-08  1475.270020  1482.500000  1450.310059  1474.770020  5439720000   
2007-11-09  1467.589966  1474.089966  1448.510010  1453.699951  4587050000   
2007-11-20  1434.510010  1452.640015  1419.280029  1439.699951  4875150000   
2007-11-27  1409.589966  1429.489990  1407.430054  1428.229980  4320720000   
2007-11-28  1432.949951  1471.619995  1432.949951  1469.020020  4508020000   
2007-11-30  1471.829956  1488.939941  1470.890015  1481.140015  4422200000   
2007-12-12  1487.579956  1511.959961  1468.229980  1486.589966  4482120000   
2007-12-21  1463.189941  1485.400024  1463.189941  1484.459961  4508590000   
2008-01-08  1415.709961  1430.280029  1388.300049  1390.189941  4705390000   
2008-01-09  1390.250000  1409.189941  1378.699951  1409.130005  5351030000   
2008-01-10  1406.780029  1429.089966  1395.310059  1420.329956  5170490000   
2008-01-11  1419.910034  1419.910034  1394.829956  1401.020020  4495840000   
2008-01-15  1411.880005  1411.880005  1380.599976  1380.949951  4601640000   
2008-01-16  1377.410034  1391.989990  1364.270020  1373.199951  5440620000   
2008-01-17  1374.790039  1377.719971  1330.670044  1333.250000  5303130000   
2008-01-18  1333.900024  1350.280029  1312.510010  1325.189941  6004840000   
2008-01-22  1312.939941  1322.089966  1274.290039  1310.500000  6544690000   
2008-01-24  1340.130005  1355.150024  1334.310059  1352.069946  5735300000   
...                 ...          ...          ...          ...         ...   
2016-02-05  1913.069946  1913.069946  1872.650024  1880.050049  4929940000   
2016-02-08  1873.250000  1873.250000  1828.459961  1853.439941  5636460000   
2016-02-09  1848.459961  1868.250000  1834.939941  1852.209961  5183220000   
2016-02-10  1857.099976  1881.599976  1850.319946  1851.859985  4471170000   
2016-02-11  1847.000000  1847.000000  1810.099976  1829.079956  5500800000   
2016-02-12  1833.400024  1864.780029  1833.400024  1864.780029  4696920000   
2016-02-16  1871.439941  1895.770020  1871.439941  1895.579956  4570670000   
2016-02-17  1898.800049  1930.680054  1898.800049  1926.819946  5011540000   
2016-02-18  1927.569946  1930.000000  1915.089966  1917.829956  4436490000   
2016-02-24  1917.560059  1932.079956  1891.000000  1929.800049  4317250000   
2016-02-26  1954.949951  1962.959961  1945.780029  1948.050049  4348510000   
2016-02-29  1947.130005  1958.270020  1931.810059  1932.229980  4588180000   
2016-03-01  1937.089966  1978.349976  1937.089966  1978.349976  4819750000   
2016-03-02  1976.599976  1986.510010  1968.800049  1986.449951  4666610000   
2016-03-03  1985.599976  1993.689941  1977.369995  1993.400024  5081700000   
2016-03-04  1994.010010  2009.130005  1986.770020  1999.989990  6049930000   
2016-03-07  1996.109985  2006.119995  1989.380005  2001.760010  4968180000   
2016-03-08  1996.880005  1996.880005  1977.430054  1979.260010  4641650000   
2016-03-10  1990.969971  2005.079956  1969.250000  1989.569946  4376790000   
2016-03-17  2026.900024  2046.239990  2022.160034  2040.589966  4530480000   
2016-03-18  2041.160034  2052.360107  2041.160034  2049.580078  6503140000   
2016-04-28  2090.929932  2099.300049  2071.620117  2075.810059  4309840000   
2016-04-29  2071.820068  2073.850098  2052.280029  2065.300049  4704720000   
2016-05-31  2100.129883  2103.479980  2088.659912  2096.949951  4514410000   
2016-06-17  2078.199951  2078.199951  2062.840088  2071.219971  4952630000   
2016-06-24  2103.810059  2103.810059  2032.569946  2037.410034  7597449600   
2016-06-27  2031.449951  2031.449951  1991.680054  2000.540039  5431220000   
2016-06-28  2006.670044  2036.089966  2006.670044  2036.089966  4385810000   
2016-06-30  2073.169922  2098.939941  2070.000000  2098.860107  4622820000   
2016-09-16  2146.479980  2146.479980  2131.199951  2139.159912  5014360000   

              Adj Close  
Date                     
2007-07-26  1482.660034  
2007-07-27  1458.949951  
2007-07-31  1455.270020  
2007-08-01  1465.810059  
2007-08-02  1472.199951  
2007-08-06  1467.670044  
2007-08-07  1476.709961  
2007-08-08  1497.489990  
2007-08-09  1453.089966  
2007-08-10  1453.640015  
2007-08-16  1411.270020  
2007-11-07  1475.619995  
2007-11-08  1474.770020  
2007-11-09  1453.699951  
2007-11-20  1439.699951  
2007-11-27  1428.229980  
2007-11-28  1469.020020  
2007-11-30  1481.140015  
2007-12-12  1486.589966  
2007-12-21  1484.459961  
2008-01-08  1390.189941  
2008-01-09  1409.130005  
2008-01-10  1420.329956  
2008-01-11  1401.020020  
2008-01-15  1380.949951  
2008-01-16  1373.199951  
2008-01-17  1333.250000  
2008-01-18  1325.189941  
2008-01-22  1310.500000  
2008-01-24  1352.069946  
...                 ...  
2016-02-05  1880.050049  
2016-02-08  1853.439941  
2016-02-09  1852.209961  
2016-02-10  1851.859985  
2016-02-11  1829.079956  
2016-02-12  1864.780029  
2016-02-16  1895.579956  
2016-02-17  1926.819946  
2016-02-18  1917.829956  
2016-02-24  1929.800049  
2016-02-26  1948.050049  
2016-02-29  1932.229980  
2016-03-01  1978.349976  
2016-03-02  1986.449951  
2016-03-03  1993.400024  
2016-03-04  1999.989990  
2016-03-07  2001.760010  
2016-03-08  1979.260010  
2016-03-10  1989.569946  
2016-03-17  2040.589966  
2016-03-18  2049.580078  
2016-04-28  2075.810059  
2016-04-29  2065.300049  
2016-05-31  2096.949951  
2016-06-17  2071.219971  
2016-06-24  2037.410034  
2016-06-27  2000.540039  
2016-06-28  2036.089966  
2016-06-30  2098.860107  
2016-09-16  2139.159912  

[761 rows x 6 columns]
  winsorise_uint32(raw_data, invalid_data_behavior, 'volume', *OHLC)
Merging daily equity files:  [---------#--------------------------]  3
Merging daily equity files:  [####################################]   
Downloading Yahoo adjustment data:   [####################################]  100%

Sincerely, zheng

weiguang-zz avatar Nov 01 '16 09:11 weiguang-zz

Have the same problem here with custom data bundle. A lot of data seem missing because of that weird uint32 limit:

zipline/data/us_equity_pricing.py:171: UserWarning: Ignoring 2366 values because they are 
out of bounds for uint32:
               open     high      low    close       volume
date                                                       
2007-05-29  0.14200  0.14490  0.13780  0.13950   4874778000
2007-05-30  0.13770  0.13900  0.13550  0.13850   4698312000
2007-06-15  0.14690  0.14750  0.14400  0.14450   4952897000
...
2017-02-15  0.06780  0.06885  0.06720  0.06777   9098170000
2017-02-16  0.06818  0.06818  0.06759  0.06799   4326700000
2017-02-17  0.06824  0.06824  0.06755  0.06793  11253050000

[2366 rows x 5 columns]
  winsorise_uint32(raw_data, invalid_data_behavior, 'volume', *OHLC)

Help please?

adegtyarev avatar Feb 18 '17 17:02 adegtyarev

Hey @zhangzheng88 thanks for opening up this issue. The size limit has to do with some stuff in BColz (which you can look through if you're curious, in this repo)

Currently we don't support volumes that are larger than the size of uint32 (you can see that number with numpy (np.iinfo(np.uint32)), but we can keep this issue in the backlog and if we need to discuss changing the limit to something like uint64 then reference this issue again.

freddiev4 avatar Mar 08 '17 21:03 freddiev4

@FreddieV4 I have the same issue when I try to ingest data which on some dates is NaN. How should I treat this? If I just dropna() dates with NaN I get a different error: inside daily_bar_writer I get AssertionError: Got 276 rows for daily bars table with first day=2016-01-04, last day=2017-08-31, expected 430 rows.

JoaoAparicio avatar Oct 05 '17 16:10 JoaoAparicio

@JoaoAparicio where are you ingesting data from? My default action to something like that is just fillna(0) then ffill and then see what happens/figure out if that's appropriate (which in this case might not actually be the best approach)

On a somewhat related note, assuming the missing 154 rows were previously NaN, I'm not sure why you're getting so many NaN values (missing some context here)

freddiev4 avatar Oct 05 '17 16:10 freddiev4

Seems the same uint32 limit is hit for zipline ingest -b quandl - only for one date though:

Ignoring 1 values because they are out of bounds for uint32:             
            open  high   low  close        volume  ex_dividend  split_ratio
2011-04-11  1.79  1.84  1.55    1.7  6.674913e+09          0.0          1.0
winsorise_uint32(raw_data, invalid_data_behavior, 'volume', *OHLC)

amarin15 avatar Nov 30 '17 04:11 amarin15

I think this issue comes from here https://github.com/quantopian/zipline/blob/master/zipline/data/us_equity_pricing.py#L139 The default behavior seems to be "warn" but I think that does not make sense for Volume since the major use cases for it should all be okay with a "bigger than big" answer. I think the expected behavior should be more like constrain or coerce with the value being set to 2³²-1.

For instance, when used to evaluate slippage, max_int is a fine response even if the real number would have been 10 or 20 times that since the effect will be to fill even the largest possible order. In the case of a strategy using Volume as a signal source, the effect would similarly have to be independent of just how many more than 2.147 billion shares were traded that day. Regardless of what the actual number is, the meaning would still be "an extraordinarily large number of shares".

dannypurcell avatar Feb 16 '18 18:02 dannypurcell

@zhangzheng88 @adegtyarev @freddiev4 @JoaoAparicio @amarin15 I have mentioned a temporary solution in a post and save the missing data.

Basically, replace 'uint32' with 'uint64' in those two files: us_equity_pricing.py and minute_bars.py

0xboz avatar Jun 10 '19 23:06 0xboz

Has there been any other progress on this issue? I cannot find any us_equity_pricing.py file. Is it enough to replace all instances of uint32 by uint64 in minute_bar.py as @0xboz mentioned above?
More generally, is there a plan to support volumes which are larger than UINT32_MAX, or to just set larger volumes equal to UINT32_MAX when they are read in?

cactus1549 avatar Nov 30 '20 15:11 cactus1549

any updates? I am having this exact issue, please help

huy8208 avatar Jun 28 '22 17:06 huy8208

@.***邮箱联系我,谢谢!

andycwang avatar Jun 28 '22 17:06 andycwang