gnfinder icon indicating copy to clipboard operation
gnfinder copied to clipboard

Document false positive Pithopus inermis

Open Archilegt opened this issue 3 years ago • 5 comments

Document false positive Pithopus inermis on page https://www.biodiversitylibrary.org/page/663902 The name does not occur on that page. If we figure out what went wrong maybe we could fix it.

Archilegt avatar Aug 29 '22 10:08 Archilegt

Maybe "petiolis inermibus" or a spelling variant is producing the false positive.

Archilegt avatar Aug 29 '22 10:08 Archilegt

I think the related output from gnfinder is this one:

    {
      "cardinality": 2,
      "verbatim": "Petrolus inermis,",
      "name": "Petrolus inermis",
      "oddsLog10": 11.983664170973137,
      "oddsDetails": [
        {
          "feature": "spDict: inSpecies",
          "odds": 8904.045433955427
        },
        {
          "feature": "uniDict: inGenus",
          "odds": 2976.794090112943
        },
        {
          "feature": "uniEnd3: lus",
          "odds": 570.6314549737272
        },
        {
          "feature": "spEnd3: mis",
          "odds": 210.6946910672223
        },
        {
          "feature": "spLen: 7",
          "odds": 3.6025724692203513
        },
        {
          "feature": "uniLen: 8",
          "odds": 0.9606164921956841
        },
        {
          "feature": "abbr: false",
          "odds": 0.8732848865715452
        },
        {
          "feature": "priorOdds: true",
          "odds": 0.1
        }
      ],
      "start": 143,
      "end": 160,
      "annotationNomenType": "NO_ANNOT",
      "verification": {
        "id": "0dbc49e2-b393-5d52-a0be-2b09ce6231fa",
        "name": "Petrolus inermis",
        "cardinality": 2,
        "matchType": "PartialExact",
        "bestResult": {
          "dataSourceId": 181,
          "dataSourceTitleShort": "IRMNG",
          "curation": "Curated",
          "recordId": "urn:lsid:irmng.org:taxname:1391559",
          "entryDate": "2022-06-10",
          "sortScore": 8.67908829458864,
          "matchedName": "Petrolus Rafinesque, 1815",
          "matchedCardinality": 1,
          "matchedCanonicalSimple": "Petrolus",
          "matchedCanonicalFull": "Petrolus",
          "currentRecordId": "urn:lsid:irmng.org:taxname:1391559",
          "currentName": "Petrolus Rafinesque, 1815",
          "currentCardinality": 1,
          "currentCanonicalSimple": "Petrolus",
          "currentCanonicalFull": "Petrolus",
          "isSynonym": false,
          "classificationPath": "Biota|Animalia|Chordata|Vertebrata|Reptilia|Reptilia|Reptilia|Petrolus",
          "classificationRanks": "|Kingdom|Phylum|Subphylum|Class|Order|Family|Genus",
          "classificationIds": "urn:lsid:irmng.org:taxname:1|urn:lsid:irmng.org:taxname:2|urn:lsid:irmng.org:taxname:148|urn:lsid:irmng.org:taxname:11905117|urn:lsid:irmng.org:taxname:1448|urn:lsid:irmng.org:taxname:10544|urn:lsid:irmng.org:taxname:100138|urn:lsid:irmng.org:taxname:1391559",
          "editDistance": 0,
          "stemEditDistance": 0,
          "matchType": "PartialExact",
          "scoreDetails": {
            "cardinalityScore": 0,
            "infraSpecificRankScore": 0,
            "fuzzyLessScore": 1,
            "curatedDataScore": 0.6666667,
            "authorMatchScore": 0.14285715,
            "acceptedNameScore": 1,
            "parsingQualityScore": 1
          }
        },

So looks like Pithopus inermis is not returned from gnfinder.

@mlichtenberg and @cajunjoel can you help to find out how this false positive appeared in BHL?

dimus avatar Aug 29 '22 16:08 dimus

It was old data left over from a previous name-finding algorithm. I re-ran that page through the latest version of GNFinder (1.0.0) and the data now reflects the GNFinder output shown in the previous comment (https://www.biodiversitylibrary.org/page/663902).

mlichtenberg avatar Aug 29 '22 20:08 mlichtenberg

@mlichtenberg, @cajunjoel, taking into account an imminent approach of bhlindex v1.0.0, may be we should plan to run it in October against whole BHL and get rid of outdated inaccuracies of old algorithms?

dimus avatar Aug 30 '22 15:08 dimus

Recognition of Petrolus is as expected for "Petiolus inermis" sentence in line 5, with underlying uncorrected OCR "Petrolus inermis". There is one less false positive for a centipede name! ;) I will leave the issue open in case that you wish to continue working on it.

Archilegt avatar Sep 01 '22 10:09 Archilegt