changedetection.io icon indicating copy to clipboard operation
changedetection.io copied to clipboard

Request Headers are not used in Browser Steps

Open denilsonsa opened this issue 1 year ago • 14 comments

Description

When using "Browser Steps", the custom HTTP request headers from the "Request" tab are not sent.

Version: v0.45.14 (self-hosted via docker on a x86_64 Debian Linux)

How to reproduce

  1. Create a new watch for https://www.deviceinfo.me/ or https://myhttpheader.com/
  2. At Request, choose Playwright Chromium.
  3. Still at Request, click on Show advanced options.
  4. Add a custom User-Agent in the Request headers field.
  5. Go to Browser Steps and press Play.
  6. Scroll down to where it shows both the User-Agent and the HTTP headers.

The custom headers are applied when doing a normal diff/watch. But they are not applied in the Browser steps. They seem to be applied to the Visual Filter Selector (or maybe the Visual Filter Selector uses a cached version that was fetched with the correct custom headers).

Unfortunately, some sites break when using the HeadlessChrome User-Agent (see #2051). Thus, not only I need this feature, but also I expected the custom headers to be sent on all kinds of requests.

Desktop

  • OS: Manjaro Linux (not relevant)
  • Browser: Firefox (not relevant)

(Bonus mini-question: Is there any other place to set a custom User-Agent?)

denilsonsa avatar Feb 16 '24 10:02 denilsonsa

It seems the headers from headers.txt are also ignored in Browser Steps.

denilsonsa avatar Feb 16 '24 10:02 denilsonsa

woha, thanks for the catch!

dgtlmoon avatar Feb 16 '24 10:02 dgtlmoon

https://github.com/dgtlmoon/changedetection.io/blob/ccb42bcb12a920f075c1a81bb0fc0dc18b1907ff/changedetectionio/blueprint/browser_steps/browser_steps.py#L210

if someone has a PR

dgtlmoon avatar Feb 19 '24 21:02 dgtlmoon

in progress

bigger picture -

I can see the "UI" part of BrowserSteps duplicates the connection code, which it should not, it should share the same code as the actual "Fetching the website and checking" code

https://github.com/dgtlmoon/changedetection.io/blob/ccb42bcb12a920f075c1a81bb0fc0dc18b1907ff/changedetectionio/blueprint/browser_steps/browser_steps.py#L208

https://github.com/dgtlmoon/changedetection.io/blob/52c895b2e8009977ad60dfc2b715f3ecbeeac738/changedetectionio/content_fetchers/playwright.py#L104

they should both just accept a 'page' object that is setup with the same useragent/headers/etc rather than setting it up differently

dgtlmoon avatar Feb 21 '24 12:02 dgtlmoon

So I just found out that cookies doesnt work by simply adding "cookie: somevalue" to the headers, we need to add a proper cookie jar that must store the other values like httponly, expires, path, etc

dgtlmoon avatar Feb 26 '24 09:02 dgtlmoon

Maybe, just maybe, stuff like Cookies and User-Agent should be handled separately anyway.

  • Cookies are (sometimes) available on both HTTP headers and JavaScript. And, as you mentioned, might require a proper cookie jar.
  • User-Agent is available on both HTTP headers and JavaScript. I wonder if any page will break/misbehave if they start detecting a mismatch.

Also, setting up User-Agent (and possibly Cookies) is a much more common use-case than setting up the other headers. (This is a guess, it's not based on any data.) Thus it would make sense to make them easier to setup.

denilsonsa avatar Feb 26 '24 09:02 denilsonsa

User-Agent is available on both HTTP headers and JavaScript. I wonder if any page will break/misbehave if they start detecting a mismatch.

yes correct, a lot of people dont realise this, actually UserAgent is detected in 3 ways! HTTP Headers, Javascript (navigator.userAgent) and SEC-UA browser headers in chrome

dgtlmoon avatar Feb 26 '24 10:02 dgtlmoon

@denilsonsa thanks for your report here, it has been repaired, and we will carry over at https://github.com/dgtlmoon/changedetection.io/issues/2217 :)

Correctly setting cookies cannot be done purely by adding something into the cookie: header, it should be set with proper meta information (maybe in plain requests it would work, but not with a browser)

dgtlmoon avatar Feb 26 '24 17:02 dgtlmoon

Sorry to comment on a closed thread, but I have updated to the latest version 0.45.16 and I'm still not able to use custom headers in Browser Steps. Using the Steps to Reproduce in the original post, I'm still seeing that my custom headers are not applied (both in the Request Advanced Options as well as headers.txt).

I'm using version 0.45.16 along with dgtlmoon/sockpuppetbrowser:latest

Is anyone else able to get it to work?

nomer avatar Mar 23 '24 04:03 nomer

@dgtlmoon I think the bug is still in release 0.45.17. I tested setting the User-Agent in the Request headers field and also in headers.txt and it doesn't work.

abrahampm avatar Apr 08 '24 15:04 abrahampm

should a new issue be opened for this? also experiencing the same issue on sockpuppetbrowser+browser steps

pmalecka avatar May 26 '24 14:05 pmalecka

reopened, i'll double check it

dgtlmoon avatar May 22 '25 06:05 dgtlmoon

Image

it 100% works..

Image

dont forget to 'save' the new headers before trying browsersteps

or.. maybe you're doing something different? brightdata unblock browser or?

dgtlmoon avatar May 22 '25 11:05 dgtlmoon

I've just updated to v0.49.17 to try again.

dont forget to 'save' the new headers before trying browsersteps

Oh, really? That was unexpected! Maybe this could be improved.

However, it is still broken in one specific corner-case. Let me explain:

  • Custom header in the Request tab:
    • Works during page capture (i.e. shows in the history/screenshot)
    • Works during Browser Steps (⚠ but only after saving, which is not intuitive!)
  • Custom header from headers.txt
    • Works during page capture (i.e. shows in the history/screenshot), both in playwright and in plaintext modes.
    • Does NOT work during Browser Steps. The Browser Steps tab seems to disregard custom headers from headers.txt, even though those headers are applied during the real capture.

denilsonsa avatar May 23 '25 09:05 denilsonsa