slim
slim copied to clipboard
IDC Slim integration regression testing
We need to define what manual regression testing should be performed for Slim integrated with IDC, what studies/series should be confirmed to work on each IDC release.
This is a document we've been using to drive OHIF Viewer testing: https://docs.google.com/document/d/1l0RP3H6D9OCI3J2YubzFidBszu1NvB9lhUcFflnWNLc/edit, we should have something similar for Slim.
cc: @pgundluru @dclunie
We should select a diverse set of images with different transfer syntaxes. Even images with the same transfer syntax may be more or less complex due to different codec parameters such as color space, channel subsampling ratios, etc. @dclunie could you help us put together a list of "tricky" images?
@hackermd @dclunie if you can put together the list of DICOM attributes we should "sample", I can do the queries to come up with the specific representative studies/series
@fedorov I suggest the following criteria:
-
Transfer Syntax UID: I am not sure you have that indexed in the database, since it is not part of a Data Set (but rather the File Meta Information). In addition, we may want to consider coding parameters that are not captured by DICOM attributes, but can be found in the header of the JPEG or JPEG 2000 bitstreams. Since @dclunie has performed the conversions, he may have insight into this information.
-
Number of Study-related Series: Note that this is an attribute included in DICOMweb search results. Not sure how you would query for that in the database. It would be useful to use studies for testing that contain more than one series (i.e., more than one digital slide) so that we can test the ability of Slim to switch between slides and update the UI accordingly.
-
Manufacturer Model Name and Software Versions: If possible, we should cover a range of different scanners with different software versions.
-
Clinical Trial Protocol ID: We probably want images from different collections in the test set.
@hackermd @dclunie I made a dashboard to explore those aspects, and also include links to the current selection at the study and series level: https://datastudio.google.com/reporting/9c65802e-979b-4965-8b90-3bf4e2bcc32e
@hackermd thank you for the explanation today that some of the metadata that might be important for defining test instances is hiding in the JPEG header, and is not available in DICOM metadata.
Should we consider extracting those relevant attributes as part of our ETL process, and including them in some auxiliary table to facilitate that aspect of data exploration? We may not even expose those to the users, but at least have them handy to help with testing.
That would probably be a good idea. We could extract the information from the header of the first frame and include it into the table as a JSON string.
Do you have tools/instructions how to do this? How do we proceed?
Looping in Bill Clifford @bcli4d since - he is handling IDC ETL.
@bcli4d how have you implemented the IDC ETL? What programming language do you use?
We can probably experiment with extraction using the tools you have right now Markus in Google Colab, use the result to define the initial regression testing samples, and based on that experience decide how to integrate this into ETL. I don't think we need to modify ETL process yet. I added Bill to get his thoughts to help with planning.
Since we don't actually know what the problem is, we don't know what J2K bitstream metadata we need - I suggest you hold off until we find a signature for one of the problem cases. It may be sufficient to use information from the SVS TIFF ImageDescription tag, which describes some aspects of the codec used, and I have copied into ImageComments.
I added ImageComments to the dashboard.
@hackermd: ETL is Python (plus some SQL).
@hackermd - @pgundluru is in the process of testing the upcoming IDC release. Since we do not have the regression steps, please let us know if we should do anything beyond what we did so far to debug the JPEG2000 issue with 0.4.5.
@hackermd, following up on the discussion Thu, here's the list of all combinations of SoftwareVersions and TransferSyntaxEncoding for the SM series we have in IDC right now. Let me know if this is what you had in mind, or you want me to give you a list that samples some other attribute.
TransferSyntaxUID,SoftwareVersions_str,slim_url 1.2.840.10008.1.2.4.50,v12.0.15/Sat Nov 20 10:02:55 EST 2021,https://dev-viewer.canceridc.dev/slim/studies/2.25.193279499701610990504788547870106775285/series/1.3.6.1.4.1.5962.99.1.215389419.1022426870.1640892929259.2.0 1.2.840.10008.1.2.1,vFS90 01/Sat Nov 20 10:02:55 EST 2021,https://dev-viewer.canceridc.dev/slim/studies/2.25.74148830892982664081128985444965701745/series/1.3.6.1.4.1.5962.99.1.267838992.1699372767.1640945378832.2.0 1.2.840.10008.1.2.1,v12.0.15/Sat Nov 20 10:02:55 EST 2021,https://dev-viewer.canceridc.dev/slim/studies/2.25.193279499701610990504788547870106775285/series/1.3.6.1.4.1.5962.99.1.215389419.1022426870.1640892929259.2.0 1.2.840.10008.1.2.4.91,v12.4.0/Sat Nov 20 10:02:55 EST 2021,https://dev-viewer.canceridc.dev/slim/studies/2.25.241119208575412892095941144558857582708/series/1.3.6.1.4.1.5962.99.1.241357189.1829878970.1640918897029.2.0 1.2.840.10008.1.2.4.50,vFS90 01/Sat Nov 20 10:02:55 EST 2021,https://dev-viewer.canceridc.dev/slim/studies/2.25.52692399237314253327736100986240928861/series/1.3.6.1.4.1.5962.99.1.155609303.645601545.1640833149143.2.0 1.2.840.10008.1.2.1,v12.0.11/Sat Nov 20 10:02:55 EST 2021,https://dev-viewer.canceridc.dev/slim/studies/2.25.16181252012499544165879445836446987048/series/1.3.6.1.4.1.5962.99.1.208290886.1784869798.1640885830726.2.0 1.2.840.10008.1.2.4.91,vFS90 01/Sat Nov 20 10:02:55 EST 2021,https://dev-viewer.canceridc.dev/slim/studies/2.25.74148830892982664081128985444965701745/series/1.3.6.1.4.1.5962.99.1.267838992.1699372767.1640945378832.2.0 1.2.840.10008.1.2.4.91,v12.0.15/Sat Nov 20 10:02:55 EST 2021,https://dev-viewer.canceridc.dev/slim/studies/2.25.62785078102556377485575783168690471946/series/1.3.6.1.4.1.5962.99.1.263683866.789827578.1640941223706.2.0 1.2.840.10008.1.2.4.50,v12.0.11/Sat Nov 20 10:02:55 EST 2021,https://dev-viewer.canceridc.dev/slim/studies/2.25.16181252012499544165879445836446987048/series/1.3.6.1.4.1.5962.99.1.208290886.1784869798.1640885830726.2.0 1.2.840.10008.1.2.1,v12.4.0/Sat Nov 20 10:02:55 EST 2021,https://dev-viewer.canceridc.dev/slim/studies/2.25.241119208575412892095941144558857582708/series/1.3.6.1.4.1.5962.99.1.241357189.1829878970.1640918897029.2.0 1.2.840.10008.1.2.4.50,Sat Nov 20 10:02:55 EST 2021,https://dev-viewer.canceridc.dev/slim/studies/2.25.64952420005001016100439665313692637663/series/1.3.6.1.4.1.5962.99.1.237069029.1022471484.1640914608869.2.0
This is the query I used to get the above:
SELECT
TransferSyntaxUID,
ARRAY_TO_STRING(SoftwareVersions,'/') AS SoftwareVersions_str,
ANY_VALUE(CONCAT("https://dev-viewer.canceridc.dev/slim/studies/",StudyInstanceUID,'/series/',SeriesInstanceUID)) AS slim_url
FROM
`bigquery-public-data.idc_current.dicom_all`
WHERE
Modality="SM"
GROUP BY
TransferSyntaxUID,
SoftwareVersions_str
@hackermd other than checking the URLs above, is there anything else we should do for regression testing of the IDC Slim 0.5.0 instance?
I would just make sure that we also assert that the metadata is displayed correctly.
I need to reconcile the list above with the test inventory in https://docs.google.com/spreadsheets/d/12n1AWUynEFPatmNUphyW9IhOHaePN2jNUrzzjLunJtg/edit#gid=0, and coordinate with Poojitha to include this into her testing process.