eds-scikit icon indicating copy to clipboard operation
eds-scikit copied to clipboard

Errors when running `introduction.ipynb`

Open paul-bssr opened this issue 5 months ago • 0 comments

When running codes from A gentle demo section in documentation, some commands return errors (probably originating from small syntax changes) using version 0.1.6.

Description

  1. In section section "Extracting diabetes status", the following command does not output the same result than in documentation
diabetes.concept.value_counts()

Discrepancy solved in my case by replacing concept by value column

  1. In section "Extracting covid status", the code cell below returns a KeyError: 'code_list' arising from line 81 in event_from_code function
codes = dict(
    COVID=dict(
        code_list=r"U071[0145]", 
        code_type="regex",
    )
)

covid = conditions_from_icd10(
    condition_occurrence=data.condition_occurrence,
    visit_occurrence=data.visit_occurrence,
    codes=codes,
    date_min=DATE_MIN,
    date_max=DATE_MAX,
)

Changing the dictionary in the following way solved the issue in my case :

codes = dict(
    COVID=dict(
        regex=r"U071[0145]", 
    )
)
  1. In section "Adding patient age", the following error is raised when trying to compute patient age
TypeError: One of the provided Serie isn't a datetime Serie

A solution in my case was to convert, birth_datetime to datetime format using the following command :

visit_detail_covid["birth_datetime"].apply(lambda x:pd.to_datetime(x))

I guess the issue might be coming from the i2b2 connector

How to reproduce the bug

Code to load an i2b2 database (common for the 3 bugs) :

import eds_scikit
import datetime
from eds_scikit.io import HiveData

database_name = "cse_**" 

data = HiveData(
    database_name=database_name,
    database_type="I2B2"
)


DATE_MIN = datetime.datetime(2018, 1, 1)
DATE_MAX = datetime.datetime(2019, 6, 1)

Minimal code for bug 1 :

from eds_scikit.event.diabetes import diabetes_from_icd10

diabetes = diabetes_from_icd10(
    condition_occurrence=data.condition_occurrence,
    visit_occurrence=data.visit_occurrence,
    date_min=DATE_MIN,
    date_max=DATE_MAX,
)

diabetes.concept.value_counts()

Minimal code for bug 2 :

from eds_scikit.event import conditions_from_icd10

codes = dict(
    COVID=dict(
        code_list=r"U071[0145]", 
        code_type="regex",
    )
)

covid = conditions_from_icd10(
    condition_occurrence=data.condition_occurrence,
    visit_occurrence=data.visit_occurrence,
    codes=codes,
    date_min=DATE_MIN,
    date_max=DATE_MAX,
)

Minimal code for bug 3 :

from eds_scikit.event import conditions_from_icd10
from eds_scikit.utils import datetime_helpers

codes = dict(
    COVID=dict(
        regex=r"U071[0145]", 
    )
)

covid = conditions_from_icd10(
    condition_occurrence=data.condition_occurrence,
    visit_occurrence=data.visit_occurrence,
    codes=codes,
    date_min=DATE_MIN,
    date_max=DATE_MAX,
)

visit_detail_covid = data.visit_detail.merge(
    covid[["visit_occurrence_id"]],
    on="visit_occurrence_id",
    how="inner",
)

visit_detail_covid = visit_detail_covid.merge(data.person[['person_id','birth_datetime']], 
                                              on='person_id', 
                                              how='inner')

visit_detail_covid["age"] = (
    datetime_helpers.substract_datetime(
        visit_detail_covid["visit_detail_start_datetime"],
        visit_detail_covid["birth_datetime"],
        out="hours",
    )
    / (24 * 365.25)
)

paul-bssr avatar Jan 18 '24 13:01 paul-bssr