pydicom icon indicating copy to clipboard operation
pydicom copied to clipboard

pydicom.from_json breaks with incorrect VR types

Open fakyras opened this issue 3 years ago • 10 comments

I have been working with BIMCV-COVID19 dataset, which for some reason had split original DICOM file into 16-bit .png and DICOM metadata .json file using pydicom.to_json function.

Everything would be ok with this approach, however reverting .json to pydicom Dataset causes problems (using this snippet below):

g = json.load(json_file)
ds = pydicom.Dataset()
ds = ds.from_json(g)     

It does not work when there are VR DICOM non conformities, i.e. VR = "DS" and Value is "US". Attaching an example sub-S03138_ses-E16883_acq-2_run-1_bp-chest_vp-ap_dx.zip (00281054 and 20500020 are causing problems as pydicom is trying to convert strings to float based on VR values).

This DICOM does not conform the standard, but we do not live in a perfect world :( Could this be handled for checking if the Value to be converted is of valid python data type?

fakyras avatar Mar 10 '21 09:03 fakyras

semi-related issue - from_json does not handle base64 strings as well

sub-S11600_ses-E21430_run-1_bp-chest_vp-pa_dx.zip

error:

ValueError: invalid literal for int() with base 10: b'\x00\x00\x06\x00\x0c\x00\x13\x00\x19\x00 \x00&\x00-\x003\x00:\x00@\x00G\x00M\x00T\x00Z\x00a\x00g\x00n\x00t\x00{\x00\x81\x00\x88\x00\x8e\x00\x95\x00\x9b\x00\xa1\x00\xa8\x00\xae\x00\xb5\x00\xbb\x00\xc

fakyras avatar Mar 10 '21 09:03 fakyras

Thanks for reporting - perhaps these should be two issues? IIUC, the first is non-conforming data, always a problem, that should maybe be handled in the json before converting (or something). The second is about base64 strings which is maybe just a bug that should be fixed in pydicom.

pieper avatar Mar 10 '21 14:03 pieper

@fakyras : I just had a look at your second example, and for me it gives the same exception like the first one. It also contains 2 DS tags with a non-numerical content. If I replace these with valid values, I get no exception. Maybe you posted the wrong example?

mrbean-bremen avatar Mar 14 '21 12:03 mrbean-bremen

@darcymason - I'm not sure about this. A possibility would be to just return the string value for DS and IS if that are not valid numbers, and log a warning if config.enforce_valid_values is not set (raise otherwise). Or we add a new config setting for this... which I don't really like. Note that this is not about json (though the conversion needs also to be adapted), but about handling invalid DS and IS values.

Currently we handle the case of float values in IS, which we only accept if the float can be converted to an int without loss, otherwise we raise. Non-numbers in DS and IS are not valid, of course, but they are benign in the sense that we cannot falsify information that wasn't there. so maybe we can just handle this case gracefully (with a warning). What do you think?

mrbean-bremen avatar Mar 14 '21 13:03 mrbean-bremen

After having a second glance at the dataset I noticed that the values in the examples are actually correct, but the VR is wrong. These are actually RescaleType (VR LO) and PresentationLUTShape (VR CS). In both cases DS is given as the VR, which is clearly wrong.

So I take it all back - as long as we don't see real data with non-numbers encoded inDS and IS, we shall not change the behavior. Maybe we can just ignore the given VR for non-private tags (except maybe for ambiguous VRs) and use the value looked up in the dictionary. I don't know what performance impact that would have - depending on that, we might add a setting for this that is off by default.

mrbean-bremen avatar Mar 14 '21 13:03 mrbean-bremen

@mrbean-bremen thank you for spending time on this and very informative response!

please ignore the second case - I messed up while reading the file, and base64 string was corrupted.

fakyras avatar Mar 14 '21 14:03 fakyras

@fakyras The VR in your json file doesn't match the VR in https://www.dicomlibrary.com/dicom/dicom-tags/.

tag num sub-S03138_ses-E16883_acq-2_run-1_bp-chest_vp-ap_dx.json https://www.dicomlibrary.com/dicom/dicom-tags/
0028,1054 DS LO
2050,0020 DS LO

youngkiu avatar Mar 28 '21 14:03 youngkiu

@youngkiu - correct, see my comment above. @darcymason - do you think we need an option for looking up known tags? I'm not sure about this...

mrbean-bremen avatar Mar 28 '21 18:03 mrbean-bremen

@darcymason - do you think we need an option for looking up known tags? I'm not sure about this...

That has crossed my mind, but I think I'd rather not - we have so many options already and this has not been that common a problem. And it seems it is not necessarily stable - it could be the value that is wrong, or the VR that is wrong, and it is likely quite complicated to handle all the possibilities well. I'd be more in favor of it being handled in user code according to the specifics of the situation.

darcymason avatar Mar 28 '21 18:03 darcymason

Thanks - I was also reluctant to add this, and it seems to be a really rare case.

mrbean-bremen avatar Mar 28 '21 19:03 mrbean-bremen

As has been discussed, there is no good solution to handle this in pydicom - it shall be handled in user code instead. Closing.

mrbean-bremen avatar Oct 08 '22 17:10 mrbean-bremen