ontology-access-kit
ontology-access-kit copied to clipboard
Add lexical validation (spellchecking, punctuation, casing) to OAK
Followed on from #305
- spellchecking (e.g https://github.com/FlyBase/flybase-ontology-scripts/blob/master/misc/obo_spellchecker.py)
- punctuation
- correct casing (e.g. https://github.com/cmungall/capital-offence)
From @gouttegd
at FlyBase we have a small pronto-based script to do some minimal validation of the text in definitions and comments https://github.com/FlyBase/flybase-ontology-scripts/blob/master/misc/obo_spellchecker.py It checks spelling and obvious typo mistakes such as missing or duplicated punctuation (it started as a spell-checking script only, hence the name).
We have some of these right now:
➜ ✗ cat tests/input/lint-test.obo
format-version: 1.2
ontology: lint-test
[Term]
id: X:1
name: test 1 ! double whitespace
def: " foo bar " [PMID:1] ! training spaces
➜ ✗ runoak -i tests/input/lint-test.obo lint
[
{
"id": "x",
"old_value": "test 1 ",
"new_value": "test 1",
"about_node": "X:1",
"@type": "NodeRename"
}
,
{
"id": "x",
"old_value": " foo bar ",
"new_value": "foo bar",
"about_node": "X:1",
"@type": "NodeTextDefinitionChange"
}
]
You can then take this output and apply it:
$ runoak -i tests/input/lint-test.obo apply --changes-format json --changes-input changes.json -o fixed.obo -O obo
diff fixed.obo tests/input/lint-test.obo
6,7c6,7
< name: test 1
< def: "foo bar" [PMID:1]
---
> name: test 1 ! double whitespace
> def: " foo bar " [PMID:1] ! training spaces