evm-labels icon indicating copy to clipboard operation
evm-labels copied to clipboard

[Question] Contributing to label / scraping method

Open brianleect opened this issue 3 years ago • 6 comments
trafficstars

Came across the repo while requiring address label data awhile back and noticed it only covered a specific subset of label data from etherscan and scraping needed a separate tamper monkey script for each label.

Due to needing other label data not covered, I ended up making a more generalized scraper for etherscan over at https://github.com/brianleect/etherscan-labels

Would love to know how I could contribute back to this repo to populate it with more label information and perhaps also the more generalized scraping method I utilized.

brianleect avatar Aug 10 '22 07:08 brianleect

Wow @brianleect, this is a significant contribution you've made to open-source! Clearly, I'd love to join forces and have a single repo with all the labels. Whether that's you, me, or us, I have no preference so long as the library is easy to consume in node.js and JavaScript.

I saw in a quick glance that you implemented the scraper in python. Are you familiar with the JS ecosystem too?

dawsbot avatar Aug 10 '22 16:08 dawsbot

Thanks for responding @dawsbot !

I used Selenium Python for the scraping due to having used it prior. Just realized there was Selenium JS available as well. I'm familiar enough with JS and should be trivial to rewrite it.

Regarding labels

  • Label bloat (Some labels contain ~80-90k addresses which might not be relevant other users) (Do we wish to include these or leave it to users to scrape themselves?)
  • Porting over labels (Think a script to loop through my full list of json and generate code in the same format of your current labels can be done) (~400 labels atm)

Would love to know what you think about it.

brianleect avatar Aug 11 '22 05:08 brianleect

Rewriting in JS would be my goal here, but if that's a hassle, let's address that upfront.

I think the massive lists (80-90k addresses) is fine so long as we optimize the bundle output for JS. I'm happy to tag-team on this, but given my current work-load elsewhere (high), I've got ideas how to collab on this. Discord me at daws.eth# TWO FIVE SIX TWO 🙏

dawsbot avatar Aug 11 '22 21:08 dawsbot

Sent you a friend request on discord. I'm transfixed#0001.

brianleect avatar Aug 12 '22 06:08 brianleect

Rewrote the login and partial scraping format in selenium JS

https://github.com/brianleect/evm-labels/blob/master/scripts/scrape-all.js

Not too sure what is the javascript equivalent of pandas.read_html to retrieve table though.

brianleect avatar Aug 12 '22 15:08 brianleect

Nice @brianleect ! I'm excited to join forces 🙌

dawsbot avatar Aug 13 '22 17:08 dawsbot