blacklight-collector icon indicating copy to clipboard operation
blacklight-collector copied to clipboard

All URLs being passed as `http`

Open salomoneb opened this issue 5 years ago • 3 comments

Expected Result

When I enter https://rollingstone.com, Blacklight tests https://rollingstone.com.

Actual Result

When I enter https://rollingstone.com, Blacklight tests http://rollingstone.com.

Description

It looks like all urls, even ones specified as https, are being passed to the back end as http. The example.js file actually has http hardcoded, though I don't know if this is what your production app is using.

Demo: https://www.dropbox.com/s/uye88dsfr0qf81c/http.mov?dl=0

This issue occurs with sites other than http://rollingstone.com as well. I used that example because I was finding that my Blacklight results kept timing out when I tested https://rollingstone.com. Rolling Stone does redirect to https if you go to http://rollingstone.com.

I don't know if the http was causing my timeout issue or if it's some other quirk related to that particular site, but not using https when the user enters it in the input field seems like unintended behavior.

salomoneb avatar Sep 26 '20 00:09 salomoneb

FWIW, I currently see the same thing if I test rollingstone.com with the Web interface, i.e. they both time out.

I have hacked my own script based on example.js but where I removed that assumption, and then the collector appears to test fine. The inspection result includes:


  "args": "Blacklight Inspection",
  "uri_ins": "http://rollingstone.com",
  "uri_dest": "https://www.rollingstone.com/",
  "uri_redirects": [
    "http://rollingstone.com/",
    "https://rollingstone.com/"
  ]

kjetilk avatar Jan 08 '24 17:01 kjetilk

@salomoneb can you confirm whether this report refers to the Blacklight interface at https://themarkup.org/blacklight, your local version of the blacklight-collector, or both?

BatMiles avatar Jan 22 '24 20:01 BatMiles

I was referring to the Blacklight interface at https://themarkup.org/blacklight. I think Rolling Stone has changed their website since I filed this 4 (!) years ago, but I just tried the URL again and the Blacklight interface timed out after 30s. Here's the request copied from Chrome. I want to point out that I entered https://www.rollingstone.com/ in the UI bar, but it seems to be automatically getting converted to http. I did this multiple times to confirm. I think it might have something to do with a validation regex in the scripting of the web page, but I was poking around at obfuscated code and don't want to speculate.

Screenshot 2024-03-05 at 3 12 45 PM

Request

curl 'https://blacklight.api.themarkup.org/graphic-api' \
  -H 'authority: blacklight.api.themarkup.org' \
  -H 'accept: */*' \
  -H 'accept-language: en-US,en;q=0.9' \
  -H 'content-type: text/plain;charset=UTF-8' \
  -H 'origin: https://themarkup.org' \
  -H 'sec-ch-ua: "Chromium";v="122", "Not(A:Brand";v="24", "Google Chrome";v="122"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "macOS"' \
  -H 'sec-fetch-dest: empty' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-site: same-site' \
  -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36' \
  --data-raw '{"inUrl": "http://www.rollingstone.com/", "device": "mobile"}'

That returned a 502 and two error messages:

Access to XMLHttpRequest at 'https://blacklight.api.themarkup.org/graphic-api' from origin 'https://themarkup.org' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource.

POST https://blacklight.api.themarkup.org/graphic-api net::ERR_FAILED 502 (Bad Gateway)

Just for fun, I also tried https://rollingstone.com. That failed in an entirely separate way!

Request

curl 'https://blacklight.api.themarkup.org/graphic-api' \
  -H 'authority: blacklight.api.themarkup.org' \
  -H 'accept: */*' \
  -H 'accept-language: en-US,en;q=0.9' \
  -H 'content-type: text/plain;charset=UTF-8' \
  -H 'origin: https://themarkup.org' \
  -H 'sec-ch-ua: "Chromium";v="122", "Not(A:Brand";v="24", "Google Chrome";v="122"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "macOS"' \
  -H 'sec-fetch-dest: empty' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-site: same-site' \
  -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36' \
  --data-raw '{"inUrl": "http://rollingstone.com/", "device": "mobile"}'

Response

{
    "status": "error",
    "page_response": "Navigation timeout of 30000 ms exceeded",
    "error_message": "Navigation timeout of 30000 ms exceeded"
}

salomoneb avatar Mar 05 '24 20:03 salomoneb