bigbang icon indicating copy to clipboard operation
bigbang copied to clipboard

more data cleaning for 'full archive' email domain study

Open sbenthall opened this issue 3 years ago • 1 comments

The email domain study has given us a comprehensive view of organizational participation in IETF working groups but has suffered from a lot of messiness in the data.

Some steps to take:

  • [ ] remove admin domains: ietf.org, iana.org, etc.
  • [ ] isolate top contributors from generic email domains like gmx.de, gmail.com, hotmail
  • [ ] make sure emails are normalized with respect to case before analysis

sbenthall avatar Mar 03 '21 21:03 sbenthall

See #509 -- there should be a suppported dataset of domain metadata in the repository. This is currently embedded in a couple notebooks but can be pulled out.

sbenthall avatar Dec 07 '21 15:12 sbenthall