pandas-datareader
pandas-datareader copied to clipboard
Downloading commodity prices from STOOQ
Hi,
Since late 2021, I am not able to get historical prices on commodities from stooq.com using datareader. I use StooqDailyReader from pandas_datareader.stooq.
When I try to get prices on ^DJI for exemple, no problem.
from pandas_datareader.stooq import StooqDailyReader
StooqDailyReader('^DJI').read().sort_index(ascending=True)
However, if I try with ticker 'CB.F':
StooqDailyReader('CB.F').read().sort_index(ascending=True)
I get the following error:
OSError: StooqDailyReader request returned no data; check URL for invalid inputs: https://stooq.com/q/d/l/
When I check the website, I see historical prices here: https://stooq.com/q/d/?s=cb.f.
Any idea? Thanks.
Hi,
Can confirm having a similar issue.
For commodities I get the following error using 'GC.F' (gold) as an example:
import pandas_datareader.data as pdr
pdr.get_data_stooq('GC.F, '1985-10-01', '2021-12-31')
SymbolWarning: Failed to read symbol: 'GC.F', replacing with NaN
Had no problem on 31 Dec 2021 then stopped working on 1 Jan 2022. I recall having this same exact issue around early Nov 2021, though repeatedly running the script usually fixed it and it entirely stopped occurring by Dec 2021.
The timing is suspect to me as it started occurring from the start of 2022. Possibly an issue with the Stooq API rolling over to the new year? Ideally it will be resolved by them as more people start reporting the problem.
Investigating further into this issue, it appears that Stooq no longer allows the downloading of commodities data.
Firstly, using the functioning ‘^SPX’ (S&P500) equity index as an example. Directly from the website for two arbitrary dates we have the link https://stooq.com/q/d/?s=%5Espx&c=0&d1=19851001&d2=20220121 where observe that at the bottom of the page there is a “Download data in csv file...” hyperlink to get the CSV.
Correspondingly, in pandas-datareader/base.py, the _get_response(url, params, headers) method from the _BaseReader() class is used. The url
https://stooq.com/q/d/l/?s=%5ESPX&i=d&d1=19851001&d2=20220121
is constructed and the requests package is utilised to create a response object from Stooq. The historical data (identical to the previous link) is extracted with response.content and placed into dataframes.
Now for commodities such as 'GC.F' (gold) for the same two dates we have https://stooq.com/q/d/?s=gc.f&c=0&d1=19851010&d2=20220121 and notice there is no longer a “download” hyperlink.
Similarly, as before, pandas-datareader uses the link
https://stooq.com/q/d/l/?s=GC.F&i=d&d1=19851001&d2=20220121
which leads to the aforementioned “SymbolWarning: Failed to read symbol: 'GC.F', replacing with NaN”. This occurs because while response.status_code is correct and a connection with Stooq is established, response.content is empty since they appear to have blocked downloading.
Therefore, as I suspected the issue is with the Stooq API and if they no longer permit the scraping of this data, we will have to find an alternative commodities source.
In the meantime, for anyone requiring some of this data, I have some daily commodities data in my stooq-commodities repository from 1985/10/01 to 2021/12/23. They are all directly generated using details provided in gen_data.py. For example, in market_data/stooq_major.csv I have:
GC.F: Gold - COMEX https://stooq.com/q/d/?s=gc.f
SI.F: Silver - COMEX https://stooq.com/q/d/?s=si.f
HG.F: High Grade Copper - COMEX https://stooq.com/q/d/?s=hg.f
PL.F: Platinum - NYMEX https://stooq.com/q/d/?s=pf.f
PA.F: Palladium - NYMEX https://stooq.com/q/d/?s=pa.f
CL.F: Crude Oil WTI - NYMEX https://stooq.com/q/d/?s=cl.f
RB.F: Gasoline RBOB - NYMEX https://stooq.com/q/d/?s=rb.f
LS.F: Lumber - CME https://stooq.com/q/d/?s=ls.f
LE.F: Live Cattle - CME https://stooq.com/q/d/?s=le.f
KC.F: Coffee - ICE https://stooq.com/q/d/?s=kc.f
OJ.F: Orange Juice - ICE https://stooq.com/q/d/?s=oj.f
All the .pkl dataframes are equivalent to the .csv’s, while the .npy arrays are cleaned so that only dates where all assets have prices are included.
@rgrewa1 the link you provided above "I have some daily commodities data in my nonergodic-rl repository" generates an error.
@datatalking apologies, that repository is now private.
I have created a new repository stooq-commodities for solely housing this data.
Hi, Congratulations on your valuable work. I use this wonderful script. I wanted to know if there is a way to download the data of other commodities not present in the list .. example Soybeans (Zs.f) on Stooq ?? In the site https://stooq.com/q/?s=zs.f at the bottom right is the "Downloaded data" button that downloads the quotation of the single (current) day but not the total history .. Is there a solution? Thank you.
Hi, thanks for the positive feedback.
Unfortunately we are unable to acquire any commodity data from Stooq. Only the the data I have provided is available as it was acquired prior to the blocking.
Hopefully they will allow us to obtain it again in the future.
has anyone reached out to stooq.com to see if perhaps the issue might be a bug as opposed to them blocking?