demo icon indicating copy to clipboard operation
demo copied to clipboard

Not possible to call out to external websites

Open markwilkinson opened this issue 1 year ago • 11 comments
trafficstars

Description

In both my own jupyterlite, and in the demo jupyterlite, it is not possible to call out to external websites. It always results in an error related to insecure requests. This happens with all URLs that I have tested, and happens whether or not the request call includes a "validate=true/false" flag.

Reproduce

  1. Code block:
import requests

def download_file_into_memory(url):
    response = requests.get(url)
    if response.status_code == 200:
        return response.content
    else:
        print(f"Failed to download file. Status code: {response.status_code}")
        return None


file_content = download_file_into_memory("https://cnn.com")
  1. Run

  2. See error:

/lib/python3.11/site-packages/urllib3/connectionpool.py:1101: InsecureRequestWarning: Unverified HTTPS request is being made to host 'cnn.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
  warnings.warn(
---------------------------------------------------------------------------
JsException                               Traceback (most recent call last)
File /lib/python3.11/site-packages/urllib3/contrib/emscripten/fetch.py:380, in send_request(request)
    378         js_xhr.setRequestHeader(name, value)
--> 380 js_xhr.send(to_js(request.body))
    382 headers = dict(Parser().parsestr(js_xhr.getAllResponseHeaders()))

JsException: NetworkError: Failed to execute 'send' on 'XMLHttpRequest': Failed to load 'https://cnn.com/'.

During handling of the above exception, another exception occurred:

markwilkinson avatar Mar 05 '24 13:03 markwilkinson

@markwilkinson Could it be because cnn.com redirects to edition.cnn.com? Using https://edition.cnn.com/ directly in the code seems to be working fine:

image

jtpio avatar Mar 06 '24 15:03 jtpio

I don't think that's the problem... It seems that https://edition.cnn.com is the exception to the rule! I have added the auto-redirect flag and that doesn't solve the problem for any of the URLs that I want to use. I have also tried using https://github.com and https://google.ca and https://www.cbgp.upm.es (this last one I know for sure does not redirect). I have also tried in two browsers.

None of these work.

So I think the problem is real!

markwilkinson avatar Mar 07 '24 08:03 markwilkinson

I have also tried connecting directly to my server rather than the https reverse proxy (http://....) and that also throws an error (different error), but I have a feeling that Jupyter doesn't allow insecure connections anyway, so that might not be informative...??

markwilkinson avatar Mar 07 '24 08:03 markwilkinson

Have you had any further thoughts on this? I am still unable to resolve any URL, using the demo jupyterlite, other than the one you discovered that worked (edition.cnn.com). I have also tried starting from a new notebook, running %pip install requests and then trying to reach any website... same problem in all cases.

markwilkinson avatar Mar 12 '24 14:03 markwilkinson

Hi again! Have you (or anyone) found a work-around for this? I'm so excited to use jupyterlite, but all of the projects I need it for will be downloading their data from the Web, so... this is a real show-stopper for me!

Advice very welcome!

markwilkinson avatar Apr 18 '24 09:04 markwilkinson

Have you tried using fetch... so, this isn't to an external site, but check out these examples of notebooks that I run in jupyterlite: https://github.com/o19s/quepid-jupyterlite/blob/main/jupyterlite/files/examples/Multiple%20Raters%20Analysis.ipynb

Maybe because "fetch" is javascript???

epugh avatar Apr 18 '24 13:04 epugh

Thanks for the suggestion! Unfortunately, that didn't work either, and with ~identical symptoms. the "await fetch" fails with "JsException: TypeError: Failed to fetch" for all URLs other than the one we identified at the top of this issue report (https://edition.cnn.com).

So... unless I am interested in what CNN has to say (I'm not), I continue to be out of luck! ;-)

markwilkinson avatar Apr 22 '24 07:04 markwilkinson

I believe this is because of CORS. I'm not sure but I think there's no way around it. It's a browser security. You can hit a valid API endpoint though. You'd need a server for what you are trying to do. Then your server would be the one who will send an http request to the endpoint you want to hit. You might want to read this posted issue: https://github.com/jupyterlite/jupyterlite/issues/729#issue-1299865672

mrkvn avatar May 06 '24 10:05 mrkvn

Interesting! In most cases, I run the servers that I need to talk to from Jupyter, so I will try reconfiguring them to accept all in CORS. For the other cases, I will try your proxy ideas.

Thanks!! If this is the problem, then I suspect its going to be hard to fix in jupyterlite itself... which is sad! But a proxy is fine.

I'll report back here if this solves the problem. Thanks for the suggestion @mrkvn !

markwilkinson avatar May 22 '24 15:05 markwilkinson

@mrkvn this did solve the problem. It was necessary also to explicitly install support for https. Now it's all good! Thanks!

markwilkinson avatar Jun 03 '24 07:06 markwilkinson

I've been using a thing of the following form to make simple proxied requests that give me a response object r I can call as r.text, r.content, or r.json()

import requests
from urllib.parse import quote, urlencode

class ProxyResponse:
    def __init__(self, content):
        self._content = content
        
    @property
    def text(self):
        return self._content
        
    def json(self):
        import json
        return json.loads(self._content)
        
    @property
    def content(self):
        return self._content.encode()

def cors_proxy_request(url, params=None):
    """CORS proxy for GET resources with requests-like response."""
    if params:
        full_url = f"{url}?{urlencode(params)}"
    else:
        full_url = url
        
    proxy_url = f"https://corsproxy.io/?{quote(full_url)}"
    response = requests.get(proxy_url).content.decode().strip()
    return ProxyResponse(response)

psychemedia avatar Oct 24 '24 12:10 psychemedia