bigbang icon indicating copy to clipboard operation
bigbang copied to clipboard

handling for 'unknown' gendered names

Open sbenthall opened this issue 3 years ago • 2 comments

The current gender detector script will label many names as being of 'unknown' gender.

[ ] @nllz has requested a feature that would, when using the detector over a large set of names, export the gendered lists of names, and especially the 'unknown' list.

That would open the list to inspection to see if any names could be labeled, or the gender detector otherwise improved.

[ ] A related feature might support the use of an additional, custom data file assigning names to (known) genders.

sbenthall avatar Sep 03 '20 13:09 sbenthall

I've been doing this by outputting a CSV file (after entity resolution and gender estimation), manually making changes (to add/correct known gender, or to manually consolidate names/addresses that I know to be the same person) and then importing that CSV back into a dataframe.

I do think it would be possible and potentially useful to researchers who are studying overlapping communities (like IETF and W3C groups), to be able to import a CSV file with email addresses known to the researcher along with information the researcher manually annotates (gender, nation of origin, etc.).

npdoty avatar Sep 03 '20 19:09 npdoty

Connects to #509

sbenthall avatar Dec 08 '21 17:12 sbenthall