evm-labels [Feat] Generalized scraper for all labels (SeleniumJS)

[Feat] Generalized scraper for all labels (SeleniumJS)

Open brianleect opened this issue 3 years ago • 5 comments

Flow

In terminal, node scrape-all for all labels or node scrape-all labelName for single label retrieval
Login to etherscan
Extract all labels from labelcloud
Checks for existing label.json in src/mainnet/all-json which are filtered out
Checks for ignore_list labels which are hardcoded in for being too large (100k+ labels) or bugged (no values)
Loop through filteredLabels and save each label to src/mainnet/all-json as ${label}.json

Aug 14 '22 02:08 brianleect

WOW, this is an epic contribution @brianleect 🙏

Is this ready for PR review? I know we've been chatting over in discord about the importance of separating labels to separate files. If all labels are in one file, you cannot split it properly and therefore have massive bundle sizes.

Thanks again! Excited to join forces here 🎉

Aug 16 '22 19:08 dawsbot

Noticed a bug. Some labels apparently are empty. Not sure if its caused by scraping too quickly?

Aug 22 '22 05:08 brianleect

Wrote a quick script to check. Apparently 186 labels impacted. I'll try to see if re-running the scraper fixes the problem or introducing a delay.

Aug 22 '22 05:08 brianleect

Fixed the empty labels. Seems there's also some weird issue going on with label scraping where inconsistent labels are getting scraped. Had occasion where I ended up with ~370 labels scraping all and ended up managing to scrape up to 400 labels total on a second run.

Might need to test if we are getting consistent number of labels back from labelcloud and if so, might have an issue elsewhere.

Aug 22 '22 13:08 brianleect

Thanks for the comments on all this @brianleect 🙏

I'll take a look soon. I appreciate the patience, I was offline a lot for EthMexico where I competed 🙌

Aug 24 '22 16:08 dawsbot

We've got a big refactor underway already which replaces the need for SeleniumJS. Thank you for this issue @brianleect, we've decided on a different path that's working well for now! 🙏

Apr 13 '24 20:04 dawsbot

evm-labels evm-labels copied to clipboard

[Feat] Generalized scraper for all labels (SeleniumJS)

evm-labels
evm-labels copied to clipboard