copt icon indicating copy to clipboard operation
copt copied to clipboard

Http error for downloading the datasets

Open GeoffNN opened this issue 5 years ago • 8 comments

Got the following error when trying to load the Madelon dataset.

madelon dataset is not present in the folder /home/geoff/copt_data/madelon. Downloading it ...
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
<ipython-input-5-4c820c436ec3> in <module>
----> 1 A, b = cp.datasets.load_madelon()

~/PycharmProjects/copt/copt/datasets.py in load_madelon(subset, data_dir)
    153         * :ref:`sphx_glr_auto_examples_frank_wolfe_plot_vertex_overlap.py`
    154     """
--> 155     return _load_dataset("madelon", subset, data_dir)
    156 
    157 

~/PycharmProjects/copt/copt/datasets.py in _load_dataset(name, subset, data_dir)
     58         )
     59         url = "https://storage.googleapis.com/copt/datasets/%s.tar.gz" % name
---> 60         local_filename, _ = urllib.request.urlretrieve(url)
     61         print("Finished downloading")
     62 

~/anaconda3/envs/copt/lib/python3.7/urllib/request.py in urlretrieve(url, filename, reporthook, data)
    245     url_type, path = splittype(url)
    246 
--> 247     with contextlib.closing(urlopen(url, data)) as fp:
    248         headers = fp.info()
    249 

~/anaconda3/envs/copt/lib/python3.7/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    220     else:
    221         opener = _opener
--> 222     return opener.open(url, data, timeout)
    223 
    224 def install_opener(opener):

~/anaconda3/envs/copt/lib/python3.7/urllib/request.py in open(self, fullurl, data, timeout)
    529         for processor in self.process_response.get(protocol, []):
    530             meth = getattr(processor, meth_name)
--> 531             response = meth(req, response)
    532 
    533         return response

~/anaconda3/envs/copt/lib/python3.7/urllib/request.py in http_response(self, request, response)
    639         if not (200 <= code < 300):
    640             response = self.parent.error(
--> 641                 'http', request, response, code, msg, hdrs)
    642 
    643         return response

~/anaconda3/envs/copt/lib/python3.7/urllib/request.py in error(self, proto, *args)
    567         if http_err:
    568             args = (dict, 'default', 'http_error_default') + orig_args
--> 569             return self._call_chain(*args)
    570 
    571 # XXX probably also want an abstract factory that knows when it makes

~/anaconda3/envs/copt/lib/python3.7/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
    501         for handler in handlers:
    502             func = getattr(handler, meth_name)
--> 503             result = func(*args)
    504             if result is not None:
    505                 return result

~/anaconda3/envs/copt/lib/python3.7/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs)
    647 class HTTPDefaultErrorHandler(BaseHandler):
    648     def http_error_default(self, req, fp, code, msg, hdrs):
--> 649         raise HTTPError(req.full_url, code, msg, hdrs, fp)
    650 
    651 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 403: Forbidden

GeoffNN avatar May 02 '20 18:05 GeoffNN

looking into it

fabianp avatar May 07 '20 22:05 fabianp

Seems to be a problem with the URL. The following also gives 403: Forbidden.

wget "https://storage.googleapis.com/copt/datasets/gisette.tar.gz"

gideonite avatar May 13 '20 15:05 gideonite

it's because I lost access to the google cloud instance that had the data. I'm trying to reconstruct it as it was before

On Wed, May 13, 2020 at 11:27 AM Gideon Dresdner [email protected] wrote:

Seems to be a problem with the URL. The following also gives 403: Forbidden.

wget "https://storage.googleapis.com/copt/datasets/gisette.tar.gz"

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/openopt/copt/issues/39#issuecomment-628066202, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACDZB2GWFGELX3JAW72BQLRRK355ANCNFSM4MXZOSKQ .

fabianp avatar May 13 '20 15:05 fabianp

From the peanut gallery: maybe best to put it on figshare, or somesuch, where persistent URLs can be minted and you don't have to worry about keeping google storage paid for.

arokem avatar May 13 '20 15:05 arokem

Thanks @arokem, that's a good idea. The issue is fixed but i'll leave it open to look into figshare

fabianp avatar May 15 '20 13:05 fabianp

@arokem do you know if one can upload to figshare and easily access it using urllib, i.e., without authentication?

fabianp avatar May 15 '20 13:05 fabianp

Sorry : I missed the notification for your message until now. Yes, you can direct download from figshare with no authentication. The files have somewhat wonky URLs (e.g., https://ndownloader.figshare.com/files/5273800), but that's not an issue.

arokem avatar May 21 '20 23:05 arokem

TODO for myself: upload again kdd10, kdd12, news20, criteo

fabianp avatar May 22 '20 17:05 fabianp