copt
copt copied to clipboard
Http error for downloading the datasets
Got the following error when trying to load the Madelon dataset.
madelon dataset is not present in the folder /home/geoff/copt_data/madelon. Downloading it ...
---------------------------------------------------------------------------
HTTPError Traceback (most recent call last)
<ipython-input-5-4c820c436ec3> in <module>
----> 1 A, b = cp.datasets.load_madelon()
~/PycharmProjects/copt/copt/datasets.py in load_madelon(subset, data_dir)
153 * :ref:`sphx_glr_auto_examples_frank_wolfe_plot_vertex_overlap.py`
154 """
--> 155 return _load_dataset("madelon", subset, data_dir)
156
157
~/PycharmProjects/copt/copt/datasets.py in _load_dataset(name, subset, data_dir)
58 )
59 url = "https://storage.googleapis.com/copt/datasets/%s.tar.gz" % name
---> 60 local_filename, _ = urllib.request.urlretrieve(url)
61 print("Finished downloading")
62
~/anaconda3/envs/copt/lib/python3.7/urllib/request.py in urlretrieve(url, filename, reporthook, data)
245 url_type, path = splittype(url)
246
--> 247 with contextlib.closing(urlopen(url, data)) as fp:
248 headers = fp.info()
249
~/anaconda3/envs/copt/lib/python3.7/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
220 else:
221 opener = _opener
--> 222 return opener.open(url, data, timeout)
223
224 def install_opener(opener):
~/anaconda3/envs/copt/lib/python3.7/urllib/request.py in open(self, fullurl, data, timeout)
529 for processor in self.process_response.get(protocol, []):
530 meth = getattr(processor, meth_name)
--> 531 response = meth(req, response)
532
533 return response
~/anaconda3/envs/copt/lib/python3.7/urllib/request.py in http_response(self, request, response)
639 if not (200 <= code < 300):
640 response = self.parent.error(
--> 641 'http', request, response, code, msg, hdrs)
642
643 return response
~/anaconda3/envs/copt/lib/python3.7/urllib/request.py in error(self, proto, *args)
567 if http_err:
568 args = (dict, 'default', 'http_error_default') + orig_args
--> 569 return self._call_chain(*args)
570
571 # XXX probably also want an abstract factory that knows when it makes
~/anaconda3/envs/copt/lib/python3.7/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
501 for handler in handlers:
502 func = getattr(handler, meth_name)
--> 503 result = func(*args)
504 if result is not None:
505 return result
~/anaconda3/envs/copt/lib/python3.7/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs)
647 class HTTPDefaultErrorHandler(BaseHandler):
648 def http_error_default(self, req, fp, code, msg, hdrs):
--> 649 raise HTTPError(req.full_url, code, msg, hdrs, fp)
650
651 class HTTPRedirectHandler(BaseHandler):
HTTPError: HTTP Error 403: Forbidden
looking into it
Seems to be a problem with the URL. The following also gives 403: Forbidden.
wget "https://storage.googleapis.com/copt/datasets/gisette.tar.gz"
it's because I lost access to the google cloud instance that had the data. I'm trying to reconstruct it as it was before
On Wed, May 13, 2020 at 11:27 AM Gideon Dresdner [email protected] wrote:
Seems to be a problem with the URL. The following also gives 403: Forbidden.
wget "https://storage.googleapis.com/copt/datasets/gisette.tar.gz"
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/openopt/copt/issues/39#issuecomment-628066202, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACDZB2GWFGELX3JAW72BQLRRK355ANCNFSM4MXZOSKQ .
From the peanut gallery: maybe best to put it on figshare, or somesuch, where persistent URLs can be minted and you don't have to worry about keeping google storage paid for.
Thanks @arokem, that's a good idea. The issue is fixed but i'll leave it open to look into figshare
@arokem do you know if one can upload to figshare and easily access it using urllib, i.e., without authentication?
Sorry : I missed the notification for your message until now. Yes, you can direct download from figshare with no authentication. The files have somewhat wonky URLs (e.g., https://ndownloader.figshare.com/files/5273800), but that's not an issue.
TODO for myself: upload again kdd10, kdd12, news20, criteo