MusicBot icon indicating copy to clipboard operation
MusicBot copied to clipboard

Add User-Agent Header to Jsoup Connections in Transforms

Open jerichosy opened this issue 1 year ago • 1 comments

This pull request...

  • [x] Fixes a bug
  • [ ] Introduces a new feature
  • [ ] Improves an existing feature
  • [ ] Boosts code quality or performance

Description

In transforms, currently a source's url is fetched without specifying user-agent headers. This small PR adds .userAgent("Mozilla") to the line fetching the Document of the url through the Jsoup connection. I hardcoded the value as I saw elsewhere in the codebase doing the same practice. This may be improved by allowing the user-agent to be specified in the configs as part of the transform.

Purpose

When fetching sources in transforms, some servers may block (e.g. 403 Forbidden) due to missing user-agent headers. To fix, set the user-agent to "Mozilla" for the Jsoup connection before fetching the website. This allows roundabout loading from sources that block requests with missing user-agent headers to work.*

*Assuming they accept "Mozilla" as a valid user-agent header. For the source I'm using, it does.

Relevant Issue(s)

N/A (not sure if I should have created an issue first)

jerichosy avatar Aug 22 '24 09:08 jerichosy

Instead of a constant value here, maybe we should pick a good default and then allow setting the value within each transform

jagrosh avatar Aug 22 '24 13:08 jagrosh