Broken entries in CSV file

Open Mo-Gul opened this issue 4 years ago • 1 comments

As the title already states there seems to be something "broken" in the CSV file. For example almost at the end of the CSV file (see screenshot). There are two empty names followed by the name "nas" which most likely is also wrong/not complete. Do you know what (full) names were listed there or shall these entries be deleted?

image showing the last few entries of the CSV file

There are more of these entries, see lines 6325, 9458-9459, 9733, 9735, 13957, 34840.

Aug 17 '21 11:08 Mo-Gul

There were some issues with encoding in the file I started with, as it was created in pre-Unicode times and used different encodings for each line. I believe I got a lot of it fixed at some point, but it’s likely I missed some, especially for character sets with few names.

I‘ll try to do some archeology. As the dataset is seeing some use, I might even try to find more recent data.

Sep 03 '21 19:09 KarlAmort