ontology-access-kit icon indicating copy to clipboard operation
ontology-access-kit copied to clipboard

Add optional preprocessing pipelines

Open hrshdhgd opened this issue 2 years ago • 2 comments

Reference: https://github.com/monarch-initiative/mondo-ingest/issues/112#issuecomment-1329314132

  • roman numerals: A function call to convert arabic to Roman numerals (or maybe vice versa? )
  • stop word removal: similar to --exclude-tokens implemented in PR here
  • plural: I think the Lemmatization pipeline already take scare of this?

hrshdhgd avatar Nov 28 '22 16:11 hrshdhgd

roman numerals can be covered by synonymizer?

cmungall avatar Nov 28 '22 17:11 cmungall

Yes, but not easily: you will need one rule per numeral? Or are you thinking an extension to synonymiser?

matentzn avatar Nov 28 '22 17:11 matentzn