dicom icon indicating copy to clipboard operation
dicom copied to clipboard

Unexpected EOF

Open wkoszek opened this issue 5 years ago • 5 comments

Don't have more details right now (just repasting some error messages here from our last run over many Ms of DICOMs), but essentially some of our DICOMs can'd be decoded b/c of the "EOF" message. Will dig deeper into the source code later.

wkoszek avatar Apr 21 '20 19:04 wkoszek

Thank you for raising this issue! Interesting, wonder if the DICOM is malformed in someway if we're hitting some EOF inside the dicom itself? Would love more info on the dicom binary bob if possible!

suyashkumar avatar Apr 26 '20 03:04 suyashkumar

@suyashkumar Took a stab on this today. I've used codify.py from pydicom in hope to get exact same DICOM parameters. The idea is that I'd modify stuff by hand as a de-id process, and provide you a way to replicate the DICOM. That was a fiasco--looks like the resulting script which codify generates isn't runnable. I start to understand why creating a testing library will be hard.

There are 2 options here: we can think of me giving you an access to a sample file "offline", where you maybe SSH to a protected AWS instance, or (maybe better): I try to clone the Cancer Image Archive, and we could run my code on their collection of images. And we could hope we hit the same issue there. And in that case, since it's all public data, we could debug all the issues without worrying about the info in files.

wkoszek avatar Apr 26 '20 20:04 wkoszek

Thank you for your efforts! Option 2 likely sounds the best, though I'm not sure if the diversity of DICOMs in the cancer imaging archive match what you likely have--hopefully some of these issues are reproducible there!

For dicoms like this, perhaps I should add additional debug instrumentation that can output information about which element the parser was in when it encountered the error, and other contextual information helpful for debug. Of course, we can always through a couple debug logs in there for the purposes of investigating this further.

One thought that came to mind is that some underlying io package methods can return ErrUnexpectedEOF, assuming this is the exact error you are getting. Seems like this is returned when some logic is trying to read N bytes, but encounters an EOF before fully reading N bytes.

suyashkumar avatar Apr 27 '20 03:04 suyashkumar

btw, when searching for some sample data I came across: http://gdcm.sourceforge.net/wiki/index.php/Sample_DataSet which appears to have several datasets, but unclear on most if there is any data use policy or license that applies for many of them.

suyashkumar avatar Apr 27 '20 04:04 suyashkumar