ehr_deidentification icon indicating copy to clipboard operation
ehr_deidentification copied to clipboard

Custom token_text_key in dataset_creator.create()

Open vsocrates opened this issue 10 months ago • 0 comments

There's a few bugs that breaks the token_text_key parameter in dataset_creator.create(). Namely, at the following line, get_tokens() needs to take in the token_text_key, otherwise it expects the default text.

https://github.com/obi-ml-public/ehr_deidentification/blob/88751ab1f95d23d54ded39385adb8a27f57a6f72/src/robust_deid/ner_datasets/dataset_creator.py#L139-L143

This has downstream effects on a few tokenizers as well. Will make a pull request shortly.

vsocrates avatar Apr 02 '24 07:04 vsocrates