gospider
gospider copied to clipboard
doesnt pull links out of html for bandcamp
~ $(go env GOPATH)/bin/gospider -s "https://dietcig.bandcamp.com/" -B --debug -v
[0000] INFO Start crawling: https://dietcig.bandcamp.com/
[subdomains] - dietcig.bandcamp.com
[url] - [code-200] - https://dietcig.bandcamp.com/
[form] - https://dietcig.bandcamp.com/
[javascript] - https://s4.bcbits.com/bundle/bundle/1/global_head-243a7059b280df57aee82a3092d1b787.js
[javascript] - https://s4.bcbits.com/bundle/bundle/1/tralbum_head-dc42a836d10ac56b3aa730cfbe07b7d6.js
[javascript] - https://s4.bcbits.com/bundle/bundle/1/global_foot1-b86843302dee22779f9059bc9e3e5eb6.js
[javascript] - https://s4.bcbits.com/bundle/bundle/1/global_foot2-ecf4cff3c8eb7c84aaa8e0a1fd4eab75.js
[javascript] - https://s4.bcbits.com/tmpdata/cache/global_validators_bundle_8c57f6d030cddfb05cf1aae7942b766f.js
[javascript] - https://s4.bcbits.com/bundle/bundle/1/jquery_lazyload-9222bb350f055a9536b19a5494dcef8f.js
[javascript] - https://s4.bcbits.com/bundle/bundle/1/tralbum-4c810106883025cd62fc173a249d25cb.js
[javascript] - https://s4.bcbits.com/bundle/bundle/1/tralbum_templates-787d8255c41d27c39925065ea0a6b314.js
[javascript] - https://www.google-analytics.com/analytics.js
[0000] INFO Done.
I know these links are in the raw html because I can pull them out by curl + html parser/selector
~ curl https://dietcig.bandcamp.com | $(go env GOPATH)/bin/pup 'a attr{href}'|grep album
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 183k 0 183k 0 0 337k 0 --:--:-- --:--:-- --:--:-- 336k
/album/do-you-wonder-about-me
/album/over-easy-green-eggs-ham-edition
/album/swear-im-good-at-this
/album/sleep-talk-dinner-date
/album/over-easy
/help/downloading?from=tralbum_downloading