ontobio icon indicating copy to clipboard operation
ontobio copied to clipboard

"category" field in Token should be "categories"

Open falquaddoomi opened this issue 2 years ago • 0 comments

In ontobio.model.nlp (https://github.com/biolink/ontobio/blob/master/ontobio/model/nlp.py#L19) the field category is always empty, since SciGraph appears to the return the field as categories. I can't find precisely where or when in the SciGraph commit history the field was named categories, but you can see from this query that the returned field name is currently categories:

curl -X POST "https://scigraph-ontology.monarchinitiative.org/scigraph/annotations/entities" -H  "accept: application/json" -H  "content-type: application/x-www-form-urlencoded" -d "content=male&minLength=4&longestOnly=false&includeAbbrev=false&includeAcronym=false&includeNumbers=false"

The result being:

[
  {
    "token": {
      "id": "UBERON:0003101",
      "categories": [
        "anatomical entity"
      ],
      "terms": [
        "male organism"
      ]
    },
    "start": 0,
    "end": 4
  },
  {
    "token": {
      "id": "PATO:0000384",
      "categories": [
        "quality"
      ],
      "terms": [
        "male"
      ]
    },
    "start": 0,
    "end": 4
  },
  {
    "token": {
      "id": "WBbt:0007850",
      "categories": [
        "anatomical entity"
      ],
      "terms": [
        "male"
      ]
    },
    "start": 0,
    "end": 4
  }
]

If this is in fact a mislabeled field, both the field in ontobio.model.nlp.Token and the field in biolink.datamodel.serializers (https://github.com/biolink/biolink-api/blob/master/biolink/datamodel/serializers.py#L336) will need to be corrected.

(This issue comes from investigating https://github.com/biolink/biolink-api/issues/387.)

falquaddoomi avatar Mar 01 '22 17:03 falquaddoomi