bandcamp importer runs on every page
The regex for the bandcamp importer is too broad cause it can match any https url
example
"https://literally.anything".match(/^https:\/\/[^/]+(?:\/(?:album|track)\/[^/]+\/?|\/music\/?|\/?)$/) // [ "https://literally.anything/" ]
I think something like
^https:\/\/[^/]+(?:\/(?:album|track)\/[^/]+\/?|\/music\/?)$
is what is intended, that will only match stuff ending in /music, or /(album|track)/something?
I can write regexes if you can tell me what it should be matching/not matching
Unfortunately, we cannot fully prevent it, because it needs to run on the discography/landing pages as well, so not just artist.bandcamp.com/music, but also artist.bandcamp.com. Take for example this label: https://moatun7.bandcamp.com, yes there's a /music page, but most of the people are using the landing page to navigate the collection.
Similarly, potentially we could run it only on *.bandcamp.com, but then there comes the problem with white-labeling (which bandcamp allows you to do if you pay for it), where the word bandcamp will not be present in the URL at all. Can't find a working example now, but in the past this artist had such a setup: https://3six.net/album/the-box. In these cases the script should also work.
Bad news, it needs to run pretty much anywhere. Good news, the fingerprint of this is tiny, because if the script doesn't find the TralbumData object in the window.document, it exits.
If it needs to run anywhere, then it should say that, and not have a complicated regex that works out to anything in what looks like an accident
Yes, perhaps you're right, it will make it more clear. @zas @murdos what do you think? If you agree, I can make the change.
I think that's hard to avoid because of white-labeling (which was always an issue anyway), the match was made wider to handle those in https://github.com/murdos/musicbrainz-userscripts/commit/e2f318c128406341ecd323bd3a3e3d0a8a75f305 but it still required album or track in the URL.
The recent change to add links on discography (/music) including / because of landing page thing, actually make it matching anything.
The real match is https://github.com/murdos/musicbrainz-userscripts/commit/e2f318c128406341ecd323bd3a3e3d0a8a75f305#diff-06c163b15569b6a04b2da6f89363853aa2a2fa250208fdae59bbf45533382615R216 (finding TralbumData) anyway.
So, I think it's hard to avoid running anywhere if we actually want the script features to work in all cases. Not that I'm totally happy with that, but I think we don't have much choices.
If someone has a better solution to this... ;)
Thank you for the additional context! I suggest then we wait for a week in case anyone has better ideas and after that just change the regex to match all pages explicitly.
I would be fine with simply not supporting the edge case of custom domains without /music, /album or /track in the URL. Even more so since there is no actual example for this.
IMO it is also not too much to ask of users to click the "Music" link if they want to use that feature.
It seems, at least according to this https://get.bandcamp.help/hc/en-us/articles/23020691318551-How-do-I-set-up-a-custom-domain-on-Bandcamp, that due to recent changes in browser handling of third-party cookies, Bandcamp has implemented automatic redirects from custom domains to the original Bandcamp subdomains. This means that while someone can set up a custom domain, visitors will still see the Bandcamp URL in their browser's address bar after redirection. It's not clear what the nature of the problem is and whether they will be able to find a solution, but it might happen that white-labelled domains are not coming back.
I guess we can drop that case indeed, and only have two matchers, one for *.bandcamp.com and another one for *.(/music|/album|/track) (pseudo-regex).
maybe
^https:\/\/[^/]+(?:\/(?:album|track)\/[^/]+\/?|\/music\/?|\.bandcamp.com.*)$
?
https://customdomain.com/album
https://customdomain.com/track
didn't work, otherwise looks good to me.
We should separate the 2 cases:
- Bandcamp:
^https:\/\/[^.]+\.bandcamp\.com(?:\/(?:(?:album|track))\/[^/]+|\/|\/music)$
It will match:
https://artist.bandcamp.com/
https://artist.bandcamp.com/music
https://artist.bandcamp.com/album/xxx
https://artist.bandcamp.com/track/xxx
- Custom domain:
^https:\/\/[^/]+\/(?:(?:(?:album|track))\/[^/]+|music)$
It will match:
https://artist.domain/music
https://artist.domain/album/xxx
https://artist.domain/track/xxx
This one is wider, so it has to be first, but having both make it easier to disable the wider one if needed. Having one expression is possible but it will be longer/complex and would not allow one to restrict to Bandcamp domain as easily.
The second one is only matched for landing pages on Bandcamp itself (https://artist.bandcamp.com/), I'm not sure if we need to match https://artist.bandcamp.com or not (without any path).
// @include /^https:\/\/[^/]+\/(?:(?:(?:album|track))\/[^/]+|music)$/
// @include /^https:\/\/[^.]+\.bandcamp\.com(?:\/(?:(?:album|track))\/[^/]+|\/|\/music)$/
What do you think?
I like this approach, but I think matching https://artist.bandcamp.com (without the slash at the end) is important enough, at least very important to me :)
Otherwise it might look like the script doesn't work at all when we're on the artist landing page. Currently none of the regexes match it.
I like this approach, but I think matching
https://artist.bandcamp.com(without the slash at the end) is important enough, at least very important to me :)Otherwise it might look like the script doesn't work at all when we're on the artist landing page. Currently none of the regexes match it.
Would the following work for you? It matches x.bandcamp.com with or without trailing /
// @include /^https:\/\/[^/]+\/(?:(?:(?:album|track))\/[^/]+|music)$/
// @include /^https:\/\/([^.]+)\.bandcamp\.com((?:\/(?:(?:album|track))\/[^/]+|\/|\/music)?)$/
@zas this seems to cover all the cases, awesome!
This removed the ability to import from archive.org: https://github.com/murdos/musicbrainz-userscripts/pull/673/files#r2312609075
Addressed in https://github.com/murdos/musicbrainz-userscripts/pull/684.