pylidc icon indicating copy to clipboard operation
pylidc copied to clipboard

parameterize populate.py script and fix pydicom import

Open fedorov opened this issue 7 years ago • 7 comments
trafficstars

Instead of using hard-coded paths locations, reuse those from settings. While doing that, added XML annotations location to the configuration. It might make sense to move the getSettings() function to the util class. It might also be good to add the path to the sqlite database to the configuration, using the installed one by default.

Fixed import dicom, which was removed in pydicom 1.0 - now this is handeled more gracefully by catching ImportError.

fedorov avatar Jul 03 '18 20:07 fedorov

As another motivation for revisiting the process of generating the database and providing documentation, note that the TCIA content is not frozen and can be updated. If the content on TCIA has been updated, but the database in pypi or in the repo has not, for one reason or another, it makes sense to me to help the user of the code to re-generate the database.

Note XML content of LIDC-IDRI has been updated in May and June 2018.

image

fedorov avatar Jul 04 '18 17:07 fedorov

Excellent point @fedorov possibly ever-changing XML. I will use this branch for implementing the capability to reliably regenerate the sqlite database (i.e., I will add it to my long todo list and eventually get it hopefully :smile: ).

notmatthancock avatar Jul 05 '18 01:07 notmatthancock

Thanks - that helped a lot

fedorov avatar Jul 05 '18 01:07 fedorov

@notmatthancock lacking the capability to re-generate the database - any chance you could re-generate it yourself to account for the most recent changes in the XML content?

fedorov avatar Aug 10 '18 19:08 fedorov

@fedorov issues 1-5 were known and accounted for at the time of populating the database. Issue 6 was also already taken into account in populate.py as you can see in these lines of code despite the fact that I wrote that code before the issue was reported. Retrospectively, I should have reported my observation that lead to my writing that piece of workaround-code to the LIDC maintainers at the time. This leaves issue 7, the incorrect SOP id, as the only inconsistency in the pylidc database (at least with respect to those enumerated on the TCIA page).

The populate.py needs a big refactor before I believe it can be trusted to regenerate the database from scratch while still playing nice with the corresponding sqlalchemy classes. This isn't too difficult, but I'm lacking the time to invest in this currently. In the mean-time, we can add a "band-aid" fix by updating the sqlite database directly to account for the SOP id change, rather than regenerating the entire sqlite database. Unfortunately, these "hot-fix" updates are precisely the reason why a refactor is necessary :weary: but at least it will allow the database to be consistent until such an effort can be made. I'll look into it.

notmatthancock avatar Aug 11 '18 21:08 notmatthancock

I've looked again at issue 7. This issue doesn't affect the pylidc database since the image sop instance uid is not currently recorded in it.

As far as I can tell, the current pylidc database is up to date with the known issues. Of course, the ability to regenerate the database from scratch is still desired.

notmatthancock avatar Aug 12 '18 23:08 notmatthancock

@notmatthancock thank you for those details! Since the database is up to date, I agree this definitely makes the issue non-urgent.

fedorov avatar Aug 13 '18 15:08 fedorov

Closing as this is quite old now

notmatthancock avatar Dec 01 '23 03:12 notmatthancock