parallel-tutorial
parallel-tutorial copied to clipboard
prep.py Error
It seems that google have updated their API, so when running prep.py it raises a remote error:
raise RemoteDataError('Unable to read URL: {0}'.format(url)) pandas_datareader._utils.RemoteDataError: Unable to read URL: http://www.google. com/finance/historical?q=usb&startdate=Jan+27%2C+2017&enddate=Jan+27%2C+2018&out put=csv
Is there a way the offline versions of JSON files could be made available?
Same, any one can fix this problem?
Clearly the data source is no longer supported. Does anyone know an alternate source to use for the data? Or equivalent data to download?
I don't personally know of a good place to download this data, but I wouldn't be surprised if one exists.
The dask repository now includes a dask.datasets.timeseries
function that generates entirely fake data that might fit in, though would be less interesting. If someone wants to do this I suspect it would be welcome.
I also wanted to try this tutorial, but couldn't get the data:
(parallel) hfm-1804a:parallel-tutorial deil$ python prep.py
Traceback (most recent call last):
File "prep.py", line 21, in <module>
dask.set_options(get=dask.multiprocessing.get)
File "/Users/deil/software/anaconda3/envs/parallel/lib/python3.6/site-packages/dask/context.py", line 18, in set_options
raise TypeError("The dask.set_options function has been deprecated.\n"
TypeError: The dask.set_options function has been deprecated.
Please use dask.config.set instead
Before: with dask.set_options(foo='bar'):
...
After: with dask.config.set(foo='bar'):
...
I don't personally know of a good place to download this data, but I wouldn't be surprised if one exists.
How big is the data that was downloaded by prep.py
. If it's less than 1 GB maybe you could just put a copy in this Github repo?
Would be great to have this tutorial working....
I agree that putting the data into the repository is possible. Unfortunately I no longer know how to obtain the data. My recommendation that someone rework the examples to use the dask.datasets.timeseries
function is, I think, still the best approach I can think of personally. Alternate solutions would be welcome if people want to implement them.
I agree that putting the data into the repository is possible. Unfortunately I no longer know how to obtain the data.
@minrk - maybe you still have a copy of the files around?
My recommendation that someone rework the examples to use the dask.datasets.timeseries function is, I think, still the best approach I can think of personally.
I could try tomorrow. But to me, bundling example data in the tutorial repo seems like the better solution if it's small, to increase chances of it working in the future.
dask.datasets.timeseries produces random data using the numpy.random module. It's definitely as robust as packaging data, and has the benefit of working over conference wifi.
I think it's ok to have a few megabytes of data here, but we need to expect this tutorial to be run over very poor internet connections. Anything over a few tens of megabytes is unpleasant.
In order to even get to the google error I've set dask=0.20.2 and pandas =0.22 in the environment.yml file. Dask ran into the same issue as @cdeil reported, and pandas reported the following exception:
(parallel) [parallel-tutorial]$ python prep.py
Traceback (most recent call last):
File "prep.py", line 44, in <module>
write_stock(symbol)
File "prep.py", line 37, in write_stock
data_source='google')
File "/opt/anaconda3/envs/parallel/lib/python3.6/site-packages/dask/dataframe/io/demo.py", line 202, in daily_stock
from pandas_datareader import data
File "/opt/anaconda3/envs/parallel/lib/python3.6/site-packages/pandas_datareader/__init__.py", line 2, in <module>
from .data import (DataReader, Options, get_components_yahoo,
File "/opt/anaconda3/envs/parallel/lib/python3.6/site-packages/pandas_datareader/data.py", line 14, in <module>
from pandas_datareader.fred import FredReader
File "/opt/anaconda3/envs/parallel/lib/python3.6/site-packages/pandas_datareader/fred.py", line 1, in <module>
from pandas.core.common import is_list_like
ImportError: cannot import name 'is_list_like'