bandcamp-scraper icon indicating copy to clipboard operation
bandcamp-scraper copied to clipboard

Some albumUrl's invalid?

Open drone1 opened this issue 4 years ago • 5 comments
trafficstars

Hello and thanks for this great package.

I don't see what's different about certain URL's or label/artist profiles that would affect this, but a call like:

bandcamp.getAlbumInfo('https://yantmusicuk.bandcamp.com/?label=3961057738&tab=artists/album/contravention-ep-sk11x006', ...)

loads data just fine, but another URL (also returned from getArtistInfo) does not return data:

bandcamp.getAlbumInfo('https://borderonerecords.bandcamp.com/?label=3961057738&tab=artists/album/zener-diode-volt001a', ...)

You'll notice that if you point your browser to the first URL, the album page loads, whereas the second URL redirects to the artist's album grid.

When I click on a link to the album in question, the URL looks different from that returned from getArtistInfo, so I'm wondering if perhaps something's changed and needs to be updated?

Thanks again.

drone1 avatar Oct 17 '21 18:10 drone1

@89z as I said, these URL's being extracted from the result of a call to getArtistInfo (in the album.url properties).

drone1 avatar Oct 17 '21 18:10 drone1

I guess a better title for this thread might be, Is getArtistInfo returning bad album URLs? Because they don't work.

drone1 avatar Oct 17 '21 18:10 drone1

Oh hm. Look, so this is just one of the URLs that comes back from getArtistUrls:

https://borderonerecords.bandcamp.com/?label=3961057738&tab=artists

So it's already got the query string on there, and this is presumably affecting album.url in the result of getArtistInfo.

The root call is like this:

bandcamp.getArtistUrls(labelUrl, function (error, artistsUrls) {```

drone1 avatar Oct 17 '21 20:10 drone1

My code is quite simple and is based on your examples. I'm scraping a given label's releases, by using getArtistUrls -> getArtistInfo -> getAlbumInfo

I get the same pattern of result when I use a random label on BC's home page, e.g.:

getArtistUrls('https://multiculti.bandcamp.com/', ...) to get artist URLs results in:

[
  'https://nicolacruz.bandcamp.com/?label=846803195&tab=artists',
  'https://vonparty.bandcamp.com/?label=846803195&tab=artists',
  'https://dreemsdreems.bandcamp.com/?label=846803195&tab=artists',
  ...
]

This seems like your bug here. Why do these artist URLs contain this query string? Because now when one uses these URLs to do the following:

artistsUrls.forEach(url => getArtistInfo(url, (err, artistInfo) => ...))

artistInfo.album.url also includes this unneeded query string, e.g. ?label=3961057738&tab=artists which results in the nonsense URLs.

Am I using the package in an unintended way? It seems quite fundamentally broken. Perhaps some tests would be useful.

drone1 avatar Oct 18 '21 11:10 drone1

What is going on?

masterT avatar Oct 24 '21 03:10 masterT