dicomweb-client icon indicating copy to clipboard operation
dicomweb-client copied to clipboard

fix(dicomweb-client):Make the retrieveBulkData handle either single or multipart responses.

Open wayfarer3130 opened this issue 4 years ago • 9 comments

The DICOMweb standard can be interpreted in a few ways, one of which is that BulkDataURI always returns single part bulk data when requested as a full URL. This interpretation doesn't work for the existing retrieveBulkDataURI, so the library has been updated to allow retrieving as either single or multipart, letting the server decide how to return the data.

wayfarer3130 avatar Nov 17 '21 17:11 wayfarer3130

The "Value" field of a DICOM JSON representation of a Data Element is an array (see F.2.2 DICOM JSON Model Object Structure and F.2.4 DICOM JSON Value Multiplicity). If the object contains the "BulkDataURI" instead of the "Value" field, then it would also make sense for the response message to the bulkdata retrieve request to be multipart.

hackermd avatar Mar 10 '22 13:03 hackermd

While the standard may also support singlepart messages, an origin server would still have to support multipart messages. Therefore, I don't see why the behavior of the client has to change.

hackermd avatar Mar 10 '22 13:03 hackermd

CP 2204 available at https://docs.google.com/document/d/1UbAZFAyTF8p7Vf_q_lR9mex82Zkr7zB-/edit?usp=sharing&ouid=106394421000961430700&rtpof=true&sd=true specifies the single part retrieve that is setup here.

wayfarer3130 avatar Apr 07 '22 12:04 wayfarer3130

CP 2204 available at https://docs.google.com/document/d/1UbAZFAyTF8p7Vf_q_lR9mex82Zkr7zB-/edit?usp=sharing&ouid=106394421000961430700&rtpof=true&sd=true specifies the single part retrieve that is setup here.

Thanks @wayfarer3130 for sharing the draft of the CP.

Section 10.4.3.3.5 states:

The payload for a Bulkdata Resource (see Section 10.4.1.1.5) shall contain all the bulkdata for the resource. When the resource is a single Bulkdata URI, the payload will contain the single corresponding element. When the resource is a Study, Series or Instance Bulkdata resource, the payload will contain all the bulkdata of the corresponding instance(s). Bulkdata in a multipart response shall have a Content-Location header field that corresponds to the URI contained in the corresponding Element in the Metadata.

The wording is indeed confusing and should be improved in my opinion. I think the confusion is primarily due to the fact that the term "bulkdata" is used to refer to two different things:

  1. Bulkdata resources that can be accessed via the DICOMweb API at the /bulkdata endpoints. In this case, a resource represents a collection of DICOM Data Elements.
  2. Bulkdata values that can be accessed via a Bulkdata URI that's included in a JSON metadata resource. In this case, a resource represents the value of an individual DICOM Data Element.

Table 10.4.4-1 explicitly requires the media type of any bulkdata resource to be multipart/related; type="application/octet-stream". For (1), it has to be multi-part because a study, series, and instance may contain more than one bulkdata element. Since an element may have a value multiplicity greater than one and values in JSON metadata are arrays, I tend argue that the payload of the response messages shall also be multi-part for (2). In general, the client may not know whether the value is single or multi-part.

hackermd avatar Apr 11 '22 21:04 hackermd

@hackermd - the problem with things like: Table 10.4.4-1 explicitly requires the media type of any bulkdata resource to be multipart/related; type="application/octet-stream". is that they aren't definitive because there are other tables that allow it like: https://dicom.nema.org/medical/dicom/current/output/html/part18.html#table_8.7.3-4 as well as other places that suggest that it is required to support single part bulkdatauri requests.

The general approach of the DICOM standards committee to things that have been allowed by the standard, such as both the relative URI and single part bulkdata is to try to clarify how it is used, but not restrict things that were previously allowed under a reasonable reading of the standard.

Returning single part is extremely useful for anything that does a direct render like PDF, video, audio etc, and having the dcmjs library able to deal with either type is useful for when one sometimes wants to retrieve things for direct render and sometimes does background retrieves. In theory, the implementation on the DICOMweb side has to handle both, but implementations such as the static wado that stores the data in the expected final format, compressed, are far faster when they don't need to wrap and uncompress the data.

wayfarer3130 avatar Apr 12 '22 12:04 wayfarer3130

Table 10.4.4-1 explicitly requires the media type of any bulkdata resource to be multipart/related; type="application/octet-stream". is that they aren't definitive because there are other tables that allow it like: https://dicom.nema.org/medical/dicom/current/output/html/part18.html#table_8.7.3-4

I think there is a misunderstanding. The "Media Type" and "Transfer Syntax" columns of the tables in this section of the standard may refer to individual message parts, i.e., multipart/related; type=${MediaType}; transfer-syntax=${TransferSyntax}.

Section 8.7.3 DICOM Media Type Sets states:

Depending on the service, the media types may be single part or multipart, and may have required or optional Transfer Syntax and/or character set parameters.

hackermd avatar Apr 12 '22 12:04 hackermd

Returning single part is extremely useful for anything that does a direct render like PDF, video, audio etc

Rendered resources (endpoint /rendered) have always been single-part!

hackermd avatar Apr 12 '22 12:04 hackermd

The "Value" field of a DICOM JSON representation of a Data Element is an array (see F.2.2 DICOM JSON Model Object Structure and F.2.4 DICOM JSON Value Multiplicity). If the object contains the "BulkDataURI" instead of the "Value" field, then it would also make sense for the response message to the bulkdata retrieve request to be multipart.

That is a misunderstanding of the multiplicity and the binary representation - the BulkDataURI refers to a single object that contains the entire Value, all of the various parts in however many bytes plus multiplicity of values it contains. That is, if one has the value: CT\MR Then the literal bulk data URI value would be "CT\MR " (zero character at end for padding), without multiple values etc.

wayfarer3130 avatar Jun 20 '22 16:06 wayfarer3130

That is a misunderstanding of the multiplicity and the binary representation - the BulkDataURI refers to a single object that contains the entire Value, all of the various parts in however many bytes plus multiplicity of values it contains.

The BulkDataURI does not point to a single object, but to a single resource, which is defined to be a multi-part message.

I think you are right about the fact that values of a data element with value multiplicity greater than one would in principle be encoded in one message part: https://dicom.nema.org/medical/dicom/current/output/chtml/part19/chapter_A.html#para_2b996d78-0885-405e-a398-275654546c4b

However, data elements with a binary value representation (or large text) generally have a value multiplicity of 1 anyways.

The main reason the messages are multi-part is probably because the Pixel Data element may have multiple frames (and frame fragments). Each frame would need to be encoded as a separate message part.

hackermd avatar Jun 20 '22 17:06 hackermd

That is a misunderstanding of the multiplicity and the binary representation - the BulkDataURI refers to a single object that contains the entire Value, all of the various parts in however many bytes plus multiplicity of values it contains.

The BulkDataURI does not point to a single object, but to a single resource, which is defined to be a multi-part message.

I think you are right about the fact that values of a data element with value multiplicity greater than one would in principle be encoded in one message part: https://dicom.nema.org/medical/dicom/current/output/chtml/part19/chapter_A.html#para_2b996d78-0885-405e-a398-275654546c4b

However, data elements with a binary value representation (or large text) generally have a value multiplicity of 1 anyways.

The main reason the messages are multi-part is probably because the Pixel Data element may have multiple frames (and frame fragments). Each frame would need to be encoded as a separate message part.

@hackermd Now that there is an actual change to the DICOM standard to support being able to fetch using either method, what's left to merge this change? I'll also need this change for a different project and I want to help to get this merged.

igoroctaviano avatar May 31 '23 01:05 igoroctaviano

That is a misunderstanding of the multiplicity and the binary representation - the BulkDataURI refers to a single object that contains the entire Value, all of the various parts in however many bytes plus multiplicity of values it contains.

The BulkDataURI does not point to a single object, but to a single resource, which is defined to be a multi-part message. I think you are right about the fact that values of a data element with value multiplicity greater than one would in principle be encoded in one message part: https://dicom.nema.org/medical/dicom/current/output/chtml/part19/chapter_A.html#para_2b996d78-0885-405e-a398-275654546c4b However, data elements with a binary value representation (or large text) generally have a value multiplicity of 1 anyways. The main reason the messages are multi-part is probably because the Pixel Data element may have multiple frames (and frame fragments). Each frame would need to be encoded as a separate message part.

@hackermd Now that there is an actual change to the DICOM standard to support being able to fetch using either method, what's left to merge this change? I'll also need this change for a different project and I want to help to get this merged.

I updated the build to current. At this point it is mostly cleanup so that the handling of multipart and singlepart is based on the actual response type rather than having a decode step multiple places.

wayfarer3130 avatar May 31 '23 11:05 wayfarer3130

@pieper @swederik are you guys able to help review/approve this PR? not sure about Markus's availability nowadays.

igoroctaviano avatar Jun 08 '23 20:06 igoroctaviano

@pieper @swederik are you guys able to help review/approve this PR? not sure about Markus's availability nowadays.

I read through this but did not test it. If you are okay with the change and nobody else objects I'm fine with merging.

pieper avatar Jun 08 '23 21:06 pieper

LGTM 🏅 If possible I would like to know if @dclunie has any objections based on previous discussions otherwise I would like to have this merged.

igoroctaviano avatar Jul 25 '23 11:07 igoroctaviano

I have no objections.

The final text of CP 2204 which amended the section describing the Media Types, including those for the Bulk Data Resources, should now make it clear that the origin server can return either therefore the client has to support both possibilities, making this PR necessary.

Having said that, I would have been OK with making the client more robust anyway, in the sense of the "Postel principle" (write perfect, read anything).

dclunie avatar Jul 25 '23 14:07 dclunie

I have no objections.

The final text of CP 2204 which amended the section describing the Media Types, including those for the Bulk Data Resources, should now make it clear that the origin server can return either therefore the client has to support both possibilities, making this PR necessary.

Having said that, I would have been OK with making the client more robust anyway, in the sense of the "Postel principle" (write perfect, read anything).

@pieper based on this reply + my testing I think we can merge this one.

igoroctaviano avatar Jul 25 '23 21:07 igoroctaviano

@hackermd still has one pending request that's possibly been addressed. Since it sounds like this is an improvement I'll merge it and we can open new issues for anything not yet resolved.

pieper avatar Jul 27 '23 17:07 pieper

@pieper do we have to manually publish the new version or it's should be triggered by a semantic commit?

igoroctaviano avatar Jul 27 '23 17:07 igoroctaviano

I think @dannyrb only set up the automatic publishing for dcmjs but not this repo. Maybe somebody who is good with these things could help automate?

pieper avatar Jul 28 '23 01:07 pieper

I think @dannyrb only set up the automatic publishing for dcmjs but not this repo. Maybe somebody who is good with these things could help automate?

Hi Steve :wave: looks like that dicomweb PR failed to publish a new npm version: https://github.com/dcmjs-org/dicomweb-client/actions/runs/5683626006

[5:28:47 PM] [semantic-release] [@semantic-release/commit-analyzer] › ℹ  The commit should not trigger a release
[5:28:47 PM] [semantic-release] [@semantic-release/commit-analyzer] › ℹ  Analysis of 1 commits complete: no release
[5:28:47 PM] [semantic-release] › ✔  Completed step "analyzeCommits" of plugin "@semantic-release/commit-analyzer"
[5:28:47 PM] [semantic-release] › ℹ  There are no relevant changes, so no new version is released.
There are no relevant changes, so no new version is released.

I just noticed that the last commits from that PR were not in the correct format, that might be the cause

igoroctaviano avatar Jul 28 '23 12:07 igoroctaviano

Hmm, I'm not sure either. To my quick review all the commit messages look good: https://github.com/dcmjs-org/dicomweb-client/commits/master

pieper avatar Jul 28 '23 14:07 pieper

Hmm, I'm not sure either. To my quick review all the commit messages look good: https://github.com/dcmjs-org/dicomweb-client/commits/master

You are right, the squashed commit looks fine. The last published package is from April 24 though. I don't have enough permissions to take a look myself but maybe there's something wrong with the npm account? or the commit-analyzer? (the publish last action said that there were no relevant changes, skipping the version bump)

igoroctaviano avatar Jul 29 '23 18:07 igoroctaviano

:tada: This PR is included in version 0.10.3 :tada:

The release is available on:

Your semantic-release bot :package::rocket:

github-actions[bot] avatar Aug 10 '23 19:08 github-actions[bot]

@igoroctaviano Hmm, I guess whatever it was got fixed since that doc ts change resulted in a new release. 🤷

pieper avatar Aug 10 '23 19:08 pieper