mimic-code icon indicating copy to clipboard operation
mimic-code copied to clipboard

Incorrect LOINC codes in d_labitems

Open Mauvila opened this issue 3 years ago • 12 comments

Is there a plan to correct the possibly incorrect LOINC codes in d_labitems (by either replacing the incorrect LOINC codes or creating a new column in the table)? The itemids of MIMIC-IV appear to be the same as those from MIMIC-III, and (by random sampling) most of the LOINC codes haven't changed.

I first want to say that correct LOINC codes are important in this project. While a hospital system's EMR probably doesn't care about LOINC codes much because it doesn't really affect the EMR's internal operation, the LOINC codes become more important when they are used for database analysis--they allow researchers to use lab based algorithms on multiple databases.

For MIMIC-III, I looked at the separate NLM LOINC code determinations (which were provided separately from the MIMIC-III database), and then ran the LOINC codes through my own validator. Overall, I detected that about 75 of the rows (out of 753 rows total) in d_labitems had likely incorrect LOINC codes. Of the 75 likely incorrect LOINC codes, the NLM specified codes was correct in about 62 of them (about 80%). Most of these are instances where the EMR LOINC codes used the wrong LOINC property. For example, itemid of 50960 has a LOINC code of 2601-3. This is the LOINC code for serum magnesium in units of moles/volume (property=substance concentration). The correct code should be 19123-9, which is serum magnesium in units of mass/volume (property=mass concentration). This is compatible with the only reported unit in the labevents for this (mg/dL). A few of the incorrect codes are WAY off for other reasons. A lot of the other likely-incorrect codes were not as clear (based on the MIMIC-III data available).

I think it would be useful to create a new column in d_labitems that represents the revised LOINC code. This way, the original LOINC code would still be preserved (useful in the cases where the most correct LOINC code is unclear).

Mauvila avatar Aug 28 '20 02:08 Mauvila

Totally agree that we should improve them. But why not correct the current LOINC code where it is wrong, rather than having 2 columns? If it's unclear, we could NULL it. We can push an updated release with the improved LOINC codes, if you can help with a CSV mapping itemid to correct LOINC.

Normally "cleaning" the data is not something we do, but many LOINC codes were assigned post-hoc, so improving it makes sense.

alistairewj avatar Aug 28 '20 12:08 alistairewj

If you are fine with changing the current LOINC code, then I am as well. I can work on the mapping, but would prefer a second set of eyes to confirm any changes.

Mauvila avatar Sep 01 '20 02:09 Mauvila

Sure! If you make the mapping CSV I'm happy to look it over.

alistairewj avatar Sep 01 '20 15:09 alistairewj

Piggy-backing on this issue. I am also interested in LOINC-based applications. Thanks for catching the mapping errors @Mauvila Any updates on correcting the mappings for MIMIC III? It seems that only 267/1625 labs from MIMIC IV's d_labitems table have LOINC codes available - while the documentation said 'most concepts in this table have been mapped to LOINC codes', @alistairewj are you planning on mapping more codes to LOINC in future releases? Thank you!

tianranzhang avatar Jan 09 '21 17:01 tianranzhang

Hi all, @a-chahin is a clinician and informatician who has been working with us on mapping MIMIC to common terminologies (partly for an OMOP project with OHDSI: https://github.com/OHDSI/MIMIC). He also picked some major issues with LOINC mappings and we have been trying to decide on the best way of integrating everyone's efforts.

After discussion with @alistairewj, we agreed it would make sense to drop the LOINC column from the MIMIC data and include the LOINC mapping as a derived table in this repo. This allows us to continue to improve the mapping over time, and avoids the suggestion that the mappings are ground truth.

I think the the easiest way to combine our work would now be for you @mauvila to open a pull request to this repo with your mapping of item_id to loinc. Once this pull request is reviewed and merged, @a-chahin can follow up with another pull request with his fixes if there are more to be made.

Does this sound good (@mauvila @a-chahin @alistairewj @danamouk)? If so, @mauvila please could you open a pull request with a CSV file of your mapping? Please choose whichever location for the file you think best. Perhaps in a subfolder ("concept_maps"; "mapping", "terminologies"; "crosswalks"...) of https://github.com/MIT-LCP/mimic-code/tree/main/mimic-iv/concepts?

tompollard avatar Jan 20 '22 20:01 tompollard

Hi all,

@tompollard Sounds good.

a-chahin avatar Jan 20 '22 20:01 a-chahin

To move things along, we could begin with a pull request from @a-chahin, and then merge in @Mauvila's work later? A simple CSV with two columns (item_id, loinc) in a subfolder (called something like "concept_maps"; "mapping", "terminologies"; "crosswalks") would be a great start. @a-chahin give me a shout if you would like help opening the pull request.

tompollard avatar Feb 01 '22 17:02 tompollard

Sorry for the delayed response. I like the idea of separating the mapping of itemid values to LOINC from the main database--I think this will make the mapping more "agile"/responsive, and it stresses that these mappings are "external", separate from the original BIDMC data set.

I have a few comments/questions:

  1. Most of my LOINC mapping has been for MIMIC-III. Is it assured that MIMIC-IV lab itemid values (where they co-exist) are the same as those in MIMIC-III?
  2. I think the actual mapping file (which I would personally convert to a table in an SQL table when I implement it) should probably be two columns: itemid and LOINC code. This file should probably be separate from a collaborative file (see below).
  3. A lot of the mappings are not clear cut, and there is room for interpretation. There maybe should be a "collaborative" file that is used in the development of the itemid to LOINC map that includes various additional columns (eg "certainty", "notes", "initial mapping", "previous NLM analysis result"). If the mapping file were the "build", this file would be the "source code". I don't have a lot of experience with online collaboration, but I'm not sure if a Git CSV is the best method for this (although it might be). At the very least, this would need to have a dedicated discussion thread.
  4. Is it possible, since the advent of "meaningful use" and similar initiatives, that BIDMC has updated its LOINC mappings? If so, it could possibly make this whole effort redundant.

Mauvila avatar Feb 05 '22 04:02 Mauvila

Sorry for the delayed response. I like the idea of separating the mapping of itemid values to LOINC from the main database--I think this will make the mapping more "agile"/responsive, and it stresses that these mappings are "external", separate from the original BIDMC data set.

I have a few comments/questions:

  1. Most of my LOINC mapping has been for MIMIC-III. Is it assured that MIMIC-IV lab itemid values (where they co-exist) are the same as those in MIMIC-III?

Yes, to the extent we could. They were matched on the textual label. So, I suppose it is possible that a label swap would result in a mismatch, but I highly doubt that would happen.

  1. I think the actual mapping file (which I would personally convert to a table in an SQL table when I implement it) should probably be two columns: itemid and LOINC code. This file should probably be separate from a collaborative file (see below).

Worth it to look at #1245 where a PR is started. I mentioned there we should likely add the LOINC version to help future proof the mapping.

  1. A lot of the mappings are not clear cut, and there is room for interpretation. There maybe should be a "collaborative" file that is used in the development of the itemid to LOINC map that includes various additional columns (eg "certainty", "notes", "initial mapping", "previous NLM analysis result"). If the mapping file were the "build", this file would be the "source code". I don't have a lot of experience with online collaboration, but I'm not sure if a Git CSV is the best method for this (although it might be). At the very least, this would need to have a dedicated discussion thread.

There will need to be a fixed static CSV in the repo (the "published" mapping). As for a collaborative version, we don't really have a wiki set up which would facilitate discussion, so probably a shared Google sheet is the easiest way to do it.

  1. Is it possible, since the advent of "meaningful use" and similar initiatives, that BIDMC has updated its LOINC mappings? If so, it could possibly make this whole effort redundant.

Not sure, haven't been checking for any changes.

alistairewj avatar Feb 08 '22 02:02 alistairewj

In regards to a collaborative document: I looked into using a GitHub wiki page, which could work, but the issue is that the test table I made with the mapping data is fairly wide, and it is not practical for viewing with GitHub wiki webpage. To make it feasible for this format, we would have to remove a lot of columns (which may be a good idea anyway).

In terms of Google documents, I created a Google Sheets document--this venue might work well. Assuming private access, I would need a way to get email addresses from interested contributors, but it seems GitHub doesn't have ability to send private messages anymore. I am very spam averse, and I am trying not to expose my Gmail address to the bots . It might be better if one of you guys (Alistair/Tom) created the google spreadsheet based on the file I have, especially if you already have public/exposed email addresses that contributors could email you with their email addresses. That way, you could control access, etc.

Mauvila avatar Feb 12 '22 00:02 Mauvila

Hello @alistairewj

I wanted to share with you a couple of things I found in the LOINC mapping collaborative table.

While running queries on the d_labitems table in MIMIC IV, I noticed that some itemids in the mapping table do not match the itemids in the d_labitems table. After some investigation, I found that there is some sort of itemid shift that starts at itemid 51560. The itemid 51560 in the mapping table is for "Anti-Skin Antibody" while in the d_labitems table itemid 51560 is for "Anti-SARS-CoV-2 IgA" (a label that is not found in the mapping table). The correct itemid for "Anti-Skin Antibody" in the d_labitems table is 51562. The same is true for itemid 51561, in the mapping table it is for "Anti-sm" while in the d_labitems table it is "Anti-SARS-CoV-2 IgG" (Also a label not found in the mapping table).

This issues reoccurs 5 times in total. As a result, all itemids after 51560 are incorrect. I was able to correct the itemids in the mapping table using the d_labitems table in MIMIC IV as a reference. I just wanted to bring this to your attention because we were not able to figure out the cause of this itemid discrepancy.

We thought that the issue might be because of different itemid values between MIMIC III and MIMIC IV, but that does not appear to be the cause of the massive discrepancy "shift" in the mapping table. However, we noticed some discrepancies when comparing the d_labitems table from MIMIC III with the d_labitems table from MIMIC IV that starts after itemid 51520. Attached is the comparison table. d_labitems comparison.xlsx

a-chahin avatar Feb 23 '22 02:02 a-chahin

Thanks for the note. Due to the changing underlying lab data, we have to do some tricks to make sure itemid are deterministically generated. It might be we did a mapping with an internal set of identifiers which aren't consistent with the public version. Will double check this.

alistairewj avatar Feb 23 '22 18:02 alistairewj