ontology-access-kit icon indicating copy to clipboard operation
ontology-access-kit copied to clipboard

Add lexical validation (spellchecking, punctuation, casing) to OAK

Open cmungall opened this issue 2 years ago • 1 comments

Followed on from #305

  • spellchecking (e.g https://github.com/FlyBase/flybase-ontology-scripts/blob/master/misc/obo_spellchecker.py)
  • punctuation
  • correct casing (e.g. https://github.com/cmungall/capital-offence)

From @gouttegd

at FlyBase we have a small pronto-based script to do some minimal validation of the text in definitions and comments https://github.com/FlyBase/flybase-ontology-scripts/blob/master/misc/obo_spellchecker.py It checks spelling and obvious typo mistakes such as missing or duplicated punctuation (it started as a spell-checking script only, hence the name).

cmungall avatar Oct 10 '22 15:10 cmungall

We have some of these right now:

➜   ✗ cat tests/input/lint-test.obo
format-version: 1.2
ontology: lint-test

[Term]
id: X:1
name: test  1 ! double whitespace
def: " foo    bar  " [PMID:1] ! training spaces
➜   ✗ runoak -i tests/input/lint-test.obo lint
[
{
  "id": "x",
  "old_value": "test  1 ",
  "new_value": "test 1",
  "about_node": "X:1",
  "@type": "NodeRename"
}
,
{
  "id": "x",
  "old_value": " foo    bar  ",
  "new_value": "foo bar",
  "about_node": "X:1",
  "@type": "NodeTextDefinitionChange"
}
]

You can then take this output and apply it:

$ runoak -i tests/input/lint-test.obo apply --changes-format json --changes-input changes.json -o fixed.obo -O obo
diff fixed.obo tests/input/lint-test.obo
6,7c6,7
< name: test 1
< def: "foo bar" [PMID:1]
---
> name: test  1 ! double whitespace
> def: " foo    bar  " [PMID:1] ! training spaces

cmungall avatar Oct 10 '22 20:10 cmungall