ehr_deidentification icon indicating copy to clipboard operation
ehr_deidentification copied to clipboard

Shifting dates in the notes

Open bwang482 opened this issue 1 year ago • 1 comments

Hi! Thanks for open-sourcing the code and your models! It's very useful!

I want to automatically replace the found dates (in the notes) with shifted dates (e.g. by 1 day). I tried to add a custom deid_strategy which returns the original dates and nothing for other entity types, and convert the date string to datetime, add 1 day and then convert it back to string (in TextDeid.__get_deid_text), using the code below. The model seems to find all types of dates including *days e.g. "Friday".

I am wondering if there is a better way to replace the dates automatically using your code?

def string2dates(date_text):
    formats = ['%m/%d/%Y', '%m.%d.%Y', '%m/%d/%y', '%m.%d.%y', 
               '%Y-%m-%d', '%y-%m-%d', '%m-%d-%Y', '%m-%d-%y', 
               '%b %d %Y', '%B %d, %Y', '%b %d %y', '%B %d, %y']

    parsed_date = None
    for fmt in formats:
        try: 
            parsed_date = datetime.strptime(date_text, fmt)
            break
        except ValueError:
            pass
    return parsed_date

if deid_strategy == 'replace_informative' and not age_unchanged:
    deid_text = deid_text[:start_pos] + deid_tag.format(note_text[start_pos:end_pos]) + deid_text[end_pos:]
elif deid_strategy == 'shift_dates':
    if tag == "DATE":
        date_dt = string2dates(deid_tag.format(note_text[start_pos:end_pos]))
        if date_dt:
            new_date_dt = date_dt + timedelta(days=1)
            deid_text = deid_text[:start_pos] + str(new_date_dt.date()) + deid_text[end_pos:]
        else:
            deid_text = deid_text[:start_pos] + deid_tag.format(note_text[start_pos:end_pos]) + deid_text[end_pos:]
    else:
        deid_text = deid_text[:start_pos] + deid_tag.format(note_text[start_pos:end_pos]) + deid_text[end_pos:]
else:
    deid_text = deid_text[:start_pos] + deid_tag + deid_text[end_pos:]

bwang482 avatar Dec 19 '23 03:12 bwang482