demo
demo copied to clipboard
Not possible to call out to external websites
Description
In both my own jupyterlite, and in the demo jupyterlite, it is not possible to call out to external websites. It always results in an error related to insecure requests. This happens with all URLs that I have tested, and happens whether or not the request call includes a "validate=true/false" flag.
Reproduce
- Code block:
import requests
def download_file_into_memory(url):
response = requests.get(url)
if response.status_code == 200:
return response.content
else:
print(f"Failed to download file. Status code: {response.status_code}")
return None
file_content = download_file_into_memory("https://cnn.com")
-
Run
-
See error:
/lib/python3.11/site-packages/urllib3/connectionpool.py:1101: InsecureRequestWarning: Unverified HTTPS request is being made to host 'cnn.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
warnings.warn(
---------------------------------------------------------------------------
JsException Traceback (most recent call last)
File /lib/python3.11/site-packages/urllib3/contrib/emscripten/fetch.py:380, in send_request(request)
378 js_xhr.setRequestHeader(name, value)
--> 380 js_xhr.send(to_js(request.body))
382 headers = dict(Parser().parsestr(js_xhr.getAllResponseHeaders()))
JsException: NetworkError: Failed to execute 'send' on 'XMLHttpRequest': Failed to load 'https://cnn.com/'.
During handling of the above exception, another exception occurred:
@markwilkinson Could it be because cnn.com redirects to edition.cnn.com? Using https://edition.cnn.com/ directly in the code seems to be working fine:
I don't think that's the problem... It seems that https://edition.cnn.com is the exception to the rule! I have added the auto-redirect flag and that doesn't solve the problem for any of the URLs that I want to use. I have also tried using https://github.com and https://google.ca and https://www.cbgp.upm.es (this last one I know for sure does not redirect). I have also tried in two browsers.
None of these work.
So I think the problem is real!
I have also tried connecting directly to my server rather than the https reverse proxy (http://....) and that also throws an error (different error), but I have a feeling that Jupyter doesn't allow insecure connections anyway, so that might not be informative...??
Have you had any further thoughts on this? I am still unable to resolve any URL, using the demo jupyterlite, other than the one you discovered that worked (edition.cnn.com). I have also tried starting from a new notebook, running %pip install requests and then trying to reach any website... same problem in all cases.
Hi again! Have you (or anyone) found a work-around for this? I'm so excited to use jupyterlite, but all of the projects I need it for will be downloading their data from the Web, so... this is a real show-stopper for me!
Advice very welcome!
Have you tried using fetch... so, this isn't to an external site, but check out these examples of notebooks that I run in jupyterlite: https://github.com/o19s/quepid-jupyterlite/blob/main/jupyterlite/files/examples/Multiple%20Raters%20Analysis.ipynb
Maybe because "fetch" is javascript???
Thanks for the suggestion! Unfortunately, that didn't work either, and with ~identical symptoms. the "await fetch" fails with "JsException: TypeError: Failed to fetch" for all URLs other than the one we identified at the top of this issue report (https://edition.cnn.com).
So... unless I am interested in what CNN has to say (I'm not), I continue to be out of luck! ;-)
I believe this is because of CORS. I'm not sure but I think there's no way around it. It's a browser security. You can hit a valid API endpoint though. You'd need a server for what you are trying to do. Then your server would be the one who will send an http request to the endpoint you want to hit. You might want to read this posted issue: https://github.com/jupyterlite/jupyterlite/issues/729#issue-1299865672
Interesting! In most cases, I run the servers that I need to talk to from Jupyter, so I will try reconfiguring them to accept all in CORS. For the other cases, I will try your proxy ideas.
Thanks!! If this is the problem, then I suspect its going to be hard to fix in jupyterlite itself... which is sad! But a proxy is fine.
I'll report back here if this solves the problem. Thanks for the suggestion @mrkvn !
@mrkvn this did solve the problem. It was necessary also to explicitly install support for https. Now it's all good! Thanks!
I've been using a thing of the following form to make simple proxied requests that give me a response object r I can call as r.text, r.content, or r.json()
import requests
from urllib.parse import quote, urlencode
class ProxyResponse:
def __init__(self, content):
self._content = content
@property
def text(self):
return self._content
def json(self):
import json
return json.loads(self._content)
@property
def content(self):
return self._content.encode()
def cors_proxy_request(url, params=None):
"""CORS proxy for GET resources with requests-like response."""
if params:
full_url = f"{url}?{urlencode(params)}"
else:
full_url = url
proxy_url = f"https://corsproxy.io/?{quote(full_url)}"
response = requests.get(proxy_url).content.decode().strip()
return ProxyResponse(response)