gnparser icon indicating copy to clipboard operation
gnparser copied to clipboard

Authors are interpreted as subgenera

Open KatjaSchulz opened this issue 1 year ago • 6 comments

These are all valid/accepted names from the current version of the Catalogue of Life

Plant genera Nassella (Trin.) É.Desv. – simple: Trin. – full: Nassella subgen. Trin. Dacrycarpus (Endl.) de Laub. – simple: Endl. – full: Dacrycarpus subgen. Endl. Lysiphyllum (Benth.) de Wit – simple: Benth. – full: Lysiphyllum subgen. Benth. Tricholemma (Röser) Röser – simple: Roeser – full: Tricholemma subgen. Roeser Isogonium (Kützing) de Bary – simple: Kuetzing – full: Isogonium subgen. Kuetzing Euptilota (Kützing) Kützing, 1849 – simple: Kuetzing – full: Euptilota subgen. Kuetzing Setiechinopsis (Backeb.) de Haas – simple: Backeb. – full: Setiechinopsis subgen. Backeb.

Chromista genera Cyclotella (Kützing) de Brebisson – simple: Kuetzing – full: Cyclotella subgen. Kuetzing Tabularia (Kützing) Williams & Round – simple: Kuetzing – full: Tabularia subgen. Kuetzing Cyrtolophosis (Schew.) – simple: Schew. – full: Cyrtolophosis subgen. Schew. Pyrocystis (Schütt) Lemmermann, 1899 – simple: Schuett – full: Pyrocystis subgen. Schuett

Chromista families Anaulaceae (Schütt) Lemmermann – simple: Schuett – full: Anaulaceae subgen. Schuett Triceratiaceae (Schütt) Lemmermann – simple: Schuett – full: Triceratiaceae subgen. Schuett Pyxillaceae (Schütt) Simonsen – simple: Schuett – full: Pyxillaceae subgen. Schuett Pyrocystaceae (Schütt) Lemmermann, 1899 – simple: Schuett – full: Pyrocystaceae subgen. Schuett Aulacodiscaceae (Schütt) Lemmermann – simple: Schuett – full: Aulacodiscaceae subgen. Schuett Stictodiscaceae (Schütt) Simonsen – simple: Schuett – full: Stictodiscaceae subgen. Schuett Lauderiaceae (Schütt) Lemmermann – simple: Schuett – full: Lauderiaceae subgen. Schuett

Protozoa family Cyrtolophosidiidae (Schew.) – simple: Schew. – full: Cyrtolophosidiidae subgen. Schew.

KatjaSchulz avatar Jun 19 '24 16:06 KatjaSchulz

thank you @KatjaSchulz for catching this, I am not sure yet how to fix this, because in many cases Aus (Bus) does mean Aus subgen Bus.

Do names like this happen for bonaty names specifically?

dimus avatar Jul 30 '24 19:07 dimus

Yes, this is a tricky one. All the examples I found were taxa under the botanical code, except for the Cyrtolophosidiidae (Schew.) example which is a really weird one that has since been removed from COL.

One approach to fix this could be a blacklist of strings that can never be interpreted as subgenus names. I think it's pretty safe to put the author strings above on that list. But after digging some more, I also found this name: Sigmoidotropis (Piper) A.Delgado. I don't think there are any subgenera named Piper, but I don't know if I would be comfortable putting that name on the blacklist.

Another approach would be to add processing of rank information to gnparser. I usually have that information for most names I am trying to parse, and I use it to double-check the gnparser results. I realize that would probably be quite a bit of work to implement.

Anyway, here are a few more names I found in the COL 2024 annual archive:

Plant genera;

Hexaphylla (Klokov) P.Caputo & Del Guacchio – simple: Klokov – full: Hexaphylla subgen. Klokov Parogonum (Haraldson) Desjardins & J. P. Bailey – simple: Haraldson – full: Parogonum subgen. Haraldson Ericetorum (Jermy) Li Bing Zhang & X. M. Zhou – simple: Jermy – full: Ericetorum subgen. Jermy Archidasyphyllum (Cabrera) P. L. Ferreira, Saavedra & Groppo – simple: Cabrera – full: Archidasyphyllum subgen. Cabrera Lamyropsis (Kharadze) Dittrich – simple: Kharadze – full: Lamyropsis subgen. Kharadze Sigmoidotropis (Piper) A.Delgado – simple: Piper – full: Sigmoidotropis subgen. Piper Moquiniastrum (Cabrera) G. Sancho – simple: Cabrera – full: Moquiniastrum subgen. Cabrera

Chromista genera:

Hormosira (Endlichter) Meneghini, 1838 – simple: Endlichter – full: Hormosira subgen. Endlichter Syracolithus (Kamptner) Deflandre in Grassé, 1952 – simple: Kamptner – full: Syracolithus subgen. Kamptner

KatjaSchulz avatar Jul 31 '24 17:07 KatjaSchulz

I do have a list of Botanical genera authors (https://github.com/gnames/gnparser/blob/master/io/dict/data/genera_auth_icn.txt), and, if they are not ambiguous, I treat the author-matching text in parentheses after genus for bi- trinomials as authorship. I can expand this rule to uninomials as well.

This is pretty close to your suggestion @KatjaSchulz, as I understood it

dimus avatar Jul 31 '24 21:07 dimus

@KatjaSchulz would implementation of #267 help for your use case? If all names are botanical, we would not have ambiguity in parsing such names

dimus avatar Oct 24 '24 09:10 dimus

Yes, I think so. Since I am usually running comprehensive data sets through gnparser, it would be a little bit more work to separate names by code, but it would be feasible. There may be lingering problems with some microorganisms, but I think those would be negligible. Thanks!

KatjaSchulz avatar Oct 24 '24 19:10 KatjaSchulz

Ups, did not mean to close this one, reopening...

Some plant names are now recognized, some still have problems, and Chromista authors are not recognized yet.

There is a new option: code. It allows to force names to be parsed by ICN rules:

https://parser.globalnames.org/api/Hormosira%20(Endlichter)%20Meneghini,%201838?code=bot

https://parser.globalnames.org/?code=botanical&format=html&names=Syracolithus+%28Kamptner%29+Deflandre+in+Grass%C3%A9%2C+1952&with_details=on

Supported values: bact, bacterial, ICNP, bot, botanical, ICN, cult, cultivar, ICNCP, zoo, zoological, ICZN.

dimus avatar Nov 11 '24 16:11 dimus