zengm
zengm copied to clipboard
apostrophes in names are ignored by tools/names.js parser
A commit from my complexNames branch (6bdbdb4
) tried to manually edit these names, but it's been reverted so that the data/names.js file will always be in agreement with the output of tools/names.js
The parser has a bug and I have a burning passion to find it. I don't know if it's a data source issue or a parsing issue yet.
I'd like to see if there's a programmatic fix somewhere in the parser code, but worst-case scenario I think we just add some more apostrophe exceptions to the nameFixes
.
It's not a bug in the parser, the issue is I'm using data from DraftExpress which has names like http://www.draftexpress.com/profile/ATorri-Shine-78199/
If RealGM data is more comprehensive and cleaner, maybe it'd be worth switching..
Thanks for confirming it's a data source issue.
May I start writing some exceptions into nameFixes
?
ATorri and NDoye are the two giant sore thumbs to me. At least I can almost believe DAndre and DMarius.
I added the apostrophe'd first names to the nameFixes because it looks like there's only one of each. I could move them down to the unused fnFixes if you think they're common enough but my search of DraftExpress didn't bring up more than one.
Last names happened a lot so I wrote a completely new lnFixes section, based on your fnFixes.
Thanks! That part of your PR looks good. Not sure about the rest :)
Is this all the names you think need to be altered?
As of right now, yes. They're exceptions, and it seems all of the other Irish names came correctly. When new names get added to DraftExpress (and if you ever decide to re-run your crawler, say, every 18 months) then I would want to scan the output to see if there are new "error types".
Proof:
~/basketball/src/js/data $ egrep [A-Z]{2} names.js
"USA": [
["ATorri", 65],
["DAndre", 5058],
["DMarius", 5059],
["DMarr", 5060],
["MBaye", 168],
["NDoye", 201],
["NDiaye", 76],
["NDoye", 77],
"USA": [
["OBrien", 15659],
first = {USA: g.names.first};
last = {USA: g.names.last};