ehr_deidentification
ehr_deidentification copied to clipboard
Custom token_text_key in dataset_creator.create()
There's a few bugs that breaks the token_text_key
parameter in dataset_creator.create()
. Namely, at the following line, get_tokens()
needs to take in the token_text_key
, otherwise it expects the default text
.
https://github.com/obi-ml-public/ehr_deidentification/blob/88751ab1f95d23d54ded39385adb8a27f57a6f72/src/robust_deid/ner_datasets/dataset_creator.py#L139-L143
This has downstream effects on a few tokenizers as well. Will make a pull request shortly.