ontology-access-kit
ontology-access-kit copied to clipboard
Add optional preprocessing pipelines
Reference: https://github.com/monarch-initiative/mondo-ingest/issues/112#issuecomment-1329314132
- roman numerals: A function call to convert arabic to Roman numerals (or maybe vice versa? )
- stop word removal: similar to
--exclude-tokens
implemented in PR here - plural: I think the
Lemmatization
pipeline already take scare of this?
roman numerals can be covered by synonymizer?
Yes, but not easily: you will need one rule per numeral? Or are you thinking an extension to synonymiser?