pydicom
pydicom copied to clipboard
pydicom.from_json breaks with incorrect VR types
I have been working with BIMCV-COVID19 dataset, which for some reason had split original DICOM file into 16-bit .png and DICOM metadata .json file using pydicom.to_json
function.
Everything would be ok with this approach, however reverting .json to pydicom Dataset causes problems (using this snippet below):
g = json.load(json_file)
ds = pydicom.Dataset()
ds = ds.from_json(g)
It does not work when there are VR DICOM non conformities, i.e. VR = "DS" and Value is "US". Attaching an example sub-S03138_ses-E16883_acq-2_run-1_bp-chest_vp-ap_dx.zip (00281054 and 20500020 are causing problems as pydicom is trying to convert strings to float based on VR values).
This DICOM does not conform the standard, but we do not live in a perfect world :( Could this be handled for checking if the Value to be converted is of valid python data type?
semi-related issue - from_json
does not handle base64 strings as well
sub-S11600_ses-E21430_run-1_bp-chest_vp-pa_dx.zip
error:
ValueError: invalid literal for int() with base 10: b'\x00\x00\x06\x00\x0c\x00\x13\x00\x19\x00 \x00&\x00-\x003\x00:\x00@\x00G\x00M\x00T\x00Z\x00a\x00g\x00n\x00t\x00{\x00\x81\x00\x88\x00\x8e\x00\x95\x00\x9b\x00\xa1\x00\xa8\x00\xae\x00\xb5\x00\xbb\x00\xc
Thanks for reporting - perhaps these should be two issues? IIUC, the first is non-conforming data, always a problem, that should maybe be handled in the json before converting (or something). The second is about base64 strings which is maybe just a bug that should be fixed in pydicom.
@fakyras : I just had a look at your second example, and for me it gives the same exception like the first one. It also contains 2 DS tags with a non-numerical content. If I replace these with valid values, I get no exception. Maybe you posted the wrong example?
@darcymason - I'm not sure about this. A possibility would be to just return the string value for DS
and IS
if that are not valid numbers, and log a warning if config.enforce_valid_values
is not set (raise otherwise).
Or we add a new config setting for this... which I don't really like.
Note that this is not about json (though the conversion needs also to be adapted), but about handling invalid DS
and IS
values.
Currently we handle the case of float values in IS
, which we only accept if the float
can be converted to an int
without loss, otherwise we raise. Non-numbers in DS
and IS
are not valid, of course, but they are benign in the sense that we cannot falsify information that wasn't there. so maybe we can just handle this case gracefully (with a warning).
What do you think?
After having a second glance at the dataset I noticed that the values in the examples are actually correct, but the VR is wrong. These are actually RescaleType
(VR LO
) and PresentationLUTShape
(VR CS
). In both cases DS
is given as the VR, which is clearly wrong.
So I take it all back - as long as we don't see real data with non-numbers encoded inDS
and IS
, we shall not change the behavior.
Maybe we can just ignore the given VR for non-private tags (except maybe for ambiguous VRs) and use the value looked up in the dictionary. I don't know what performance impact that would have - depending on that, we might add a setting for this that is off by default.
@mrbean-bremen thank you for spending time on this and very informative response!
please ignore the second case - I messed up while reading the file, and base64 string was corrupted.
@fakyras The VR in your json file doesn't match the VR in https://www.dicomlibrary.com/dicom/dicom-tags/.
tag num | sub-S03138_ses-E16883_acq-2_run-1_bp-chest_vp-ap_dx.json | https://www.dicomlibrary.com/dicom/dicom-tags/ |
---|---|---|
0028,1054 | DS | LO |
2050,0020 | DS | LO |
@youngkiu - correct, see my comment above. @darcymason - do you think we need an option for looking up known tags? I'm not sure about this...
@darcymason - do you think we need an option for looking up known tags? I'm not sure about this...
That has crossed my mind, but I think I'd rather not - we have so many options already and this has not been that common a problem. And it seems it is not necessarily stable - it could be the value that is wrong, or the VR that is wrong, and it is likely quite complicated to handle all the possibilities well. I'd be more in favor of it being handled in user code according to the specifics of the situation.
Thanks - I was also reluctant to add this, and it seems to be a really rare case.
As has been discussed, there is no good solution to handle this in pydicom - it shall be handled in user code instead. Closing.