gnparser icon indicating copy to clipboard operation
gnparser copied to clipboard

subgenus = Incertae sedis then name string doesn't parse, also strange looking quality values

Open debpaul opened this issue 11 months ago • 3 comments

Raw data (unparsed): beulah-first-5000-name-strings-unparsed.csv

Modified GNParsed Data Set: beulath-taxonnames-gnparsed-first-5000-rows.txt

  • added family column, value = Carabidae
  • opened file in Notepad ++
  • changed CRLF line endings to UNIX (LF) (b/c upload to TW batch requires this)

Noticed

  • the Quality values look strange? Maybe on import into Excel, I need to select a certain data type for this field? Image

  • see also line 11 above where the value pseudoflavipes appears changed to pseudoflavipe0s in CanonicalFull column (also lines 116, 117)

    • don't know where that 0 comes from
  • see also Author Year leading and trailing 0. Not sure where they are coming from either Image

  • More 0 issues (and delimiters issue?), origin uncertain Image

  • Some names did not parse. (Not sure why). See screenshot next. Maybe because all these names have subgenus = (Incertae sedis) and GN doesn't recognize this value at this rank?

Image

  • In general, subgenus is missing from all parsed values.

Maybe in future?

  • option to parse (further atomize) down to lowest rank provided

debpaul avatar Jan 22 '25 20:01 debpaul

Thanks @debpaul, interesting

  1. Looks like I am missing case where subgenus is Inserte cedis. I do agree, that names like these should be parsed. I will make a separate issue about it.

  2. Strange results in quality is an artefact of postprocessing, it is impossible to get quality 10. The '0' in the middle of Canonical also seems to be postprocessing problem. Try to run this name by itself in parser

  3. Subgenus is provided, just not in the CSV format. If you pick JSON format on the web UI, you will see the subgenus results.

dimus avatar Jan 22 '25 21:01 dimus

@dimus thanks! I did note that on import to Excel, it asks about modifying or removing leading zeroes. Note sure why. I told it not to modify the data. I'll test again as you suggest.

debpaul avatar Jan 22 '25 21:01 debpaul

this is what I get without preprocessing;

beulah-parsed.txt

@debpaul can you also try Libreoffice? It consistently gives me better results than Excel

dimus avatar Jan 22 '25 21:01 dimus