highdicom
                                
                                
                                
                                    highdicom copied to clipboard
                            
                            
                            
                        Multiframe Conversion
Dear Markus, I've developed a new extensible conversion engine for HighDicom based David Clunie's PixelMed. This piece of code will be our meeting's main topic on July 9th. Although it is not necessary, it would be wonderful if you take a look at the structure of the new conversion. This file also contains the UML class diagram of the new code to get a broader focus on it.
Thanks @afshinmessiah! I had a quick look and will review more carefully before we meet.
In the meantime, as a first step I would suggest you look into the coding style section of the developer guide and make sure the code passes flake8 and mypy checks.
@hackermd the PR is not complete, and is known not to pass the checks. I suggested Afshin shares this incomplete PR with you so you can (if you like) look at the overall approach before the meeting.
@afshinmessiah please fix typing errors locally before pushing to avoid test pipeline failures such as this one https://travis-ci.com/github/MGHComputationalPathology/highdicom/jobs/430572557
@afshinmessiah please fix typing errors locally before pushing to avoid test pipeline failures such as this one https://travis-ci.com/github/MGHComputationalPathology/highdicom/jobs/430572557
Sorry, these days I'm doing so many tests/debug to convert single frame data collections and push changes in a rush. For now, I corrected the typing errors.
@hackermd, I applied all modifications you wanted. Please take a look and let me know if anything is left.
Just finished the second round of code modification and tried to apply the required changes and corrections.
Just pushed updates for the recent comments. All recent comments were applied except making the FrameSet and FrameSetCollection classes private which was not possible due to their external usage.
Thank you @CPBridge for all your wonderful comments. I did my best to apply all changes required by the comments and not to leave out any. Except few comments to which I provided some explanation, I've applied all comments. As I explained before @fedorov and I do not concur with @hackermd and @CPBridge on making the FrameSet and FrameSetCollection classes private. I also currently use those classes externally to prepare framesets prior to conversion of the TCIA collection data.
I don't think that "Swapped voctors [sic] in normal calculation" is correct - instead you might want to just remove the negation of Z, which was copied from pixelmed, which I think is wrong about this (and I will fix); this would reverse the sort order of transverse (perpendicular to Z) slices (not that there is any "right" sort order).
@hackermd @CPBridge the last day for @afshinmessiah in his current BWH position is effectively May 17 (he will then take the last 2 weeks off as vacation, before moving to his new job). Is there anything I can do to help with this PR?
Hi @afshinmessiah ,
Thank you @CPBridge for all your wonderful comments.
You are most welcome. Thank you for working through them. I hope you were able to just click "Commit suggestion" on most of them rather than reproduce by hand!
As I explained before @fedorov and I do not concur with @hackermd and @CPBridge on making the FrameSet and FrameSetCollection classes private.
This appears to be a particular point of disagreement/confusion holding this all back, so I want to try and sort this out. My position, and I think this mostly reflects @hackermd's also, is that I am not opposed outright to making FrameSet and FrameSetCollection public if they serve a purpose to the end suer, but I am just confused by why you want to/need to given the rest of the API. If we consider the "main entry points" of the code in this module to be the constructors of the three LegacyConvertedEnhancedXImage classes then these constructors cannot accept arguments of type FrameSet or FrameSetCollection - they just pass in a plain python list of legacy instances. Therefore, what would a user of highdicom do with FrameSet or FrameSetCollection if these classes were made public? Maybe you could provide a code snippet of a toy example of what user code using these classes to construct a Legacy Converted Enhanced image would actually look like. That would really help me understand your proposal.
This appears to be a particular point of disagreement/confusion holding this all back, so I want to try and sort this out. My position, and I think this mostly reflects @hackermd's also, is that I am not opposed outright to making
FrameSetandFrameSetCollectionpublic if they serve a purpose to the end suer, but I am just confused by why you want to/need to given the rest of the API. If we consider the "main entry points" of the code in this module to be the constructors of the threeLegacyConvertedEnhancedXImageclasses then these constructors cannot accept arguments of typeFrameSetorFrameSetCollection- they just pass in a plain python list of legacy instances. Therefore, what would a user ofhighdicomdo withFrameSetorFrameSetCollectionif these classes were made public? Maybe you could provide a code snippet of a toy example of what user code using these classes to construct a Legacy Converted Enhanced image would actually look like. That would really help me understand your proposal.
FrameSet class gives the list of matched Dicom datasets which will be fed into LegacyConvertedEnhancedXImage. I've attached an image of a simple conversion function (sorry I couldn't/didn't know how to/ copy the multiline code into this comment)

What if we make the FrameSet class publicly available to the user as a utility to identify legacy frames that belong to a single frame set. The user would then apply this class to select frame sets that would be passed to the converter class. Inside the converter class there could be another check, or it could just throw an exception when any assumption is violated. Afshin, I am sure we discussed this before, but I do not recall why this would not be an option. @CPBridge are you concerned about changing the API for the converter, or you are opposed to make the FrameSet class public in any form, even as a utility class?
@fedorov, after all,  within LegacyConvertedEnhancedXImage class we need to use the FrameSet class logic to identify if the single-frame list is a mixed one or not. Whatever check we use within the LegacyConvertedEnhancedXImage class is going to duplicate that logic which is not a clean thing to do.
For our purpose, the most favorite would have been taking FrameSet instead of a list of single frames as an input argument for the LegacyConvertedEnhancedXImage class constructor. Due to backward compatibility, this was not applicable, so I guess the only option would be having FrameSetCollection and FrameSet classes both accessible to the user.
having
FrameSetCollectionandFrameSetclasses both accessible to the user
@CPBridge and @hackermd would this be acceptable to you?
Hi @afshinmessiah thank you very much, after seeing your snippet I now understand the situation much better. @fedorov my primary concerns are:
- Backwards compatibility of the LegacyConvertedXImage constructors, which form part of highdicom public API
 - (to a lesser extent) consistency of that API between the various modules within highdicom, which all accept lists of source images as the first input
 
Both these points require that the constructors continue to accept a plain list of legacy datasets. However I think there is a relatively easy way forward that achieves this as well as giving the flexibility you want (which I now understand the motivation for). I propose:
- Making 
FrameSetandFrameSetCollectionpublic - Changing the type of the 
legacy_datasetsparameter of the constructors fromSequence[Dataset]toUnion[Sequence[Dataset], FrameSet]. - Implementing simple control flow that checks whether the passed 
legacy_datasetsparameter is an instance ofFrameSetand if so, skips the redundant checks. If it is not an instance ofFrameSet, it continues by constructing the frameset as the first step, as before, throwing meaningful errors if assumptions are broken. 
It seems to me that this is a nice solution that maintains backwards compatibility, gives users the option of solving the simple case (which I imagine will be quite common) using a single constructor call, but gives them the flexibility to use the frame set mechanism if they realise that they need it.
If @hackermd agrees, would you be willing to implement this proposal and contribute a brief explanation of the two cases along with simple snippets to the documentation by editing the usage.rst file that creates the User Guide? You can build the documentation locally to check what it will look like by pip installing the contents of the requirements_docs.txt file, moving to the docs directory, running the make html command, and then opening the files in the docs/_build/html in your web browser.
Though I need to review the code again carefully after the recent changes, there are a few other outstanding issues that are immediately apparent to me.
- You need to pass the 
**kwargsin theLegacyConvertedXImageclasses to the base class constructor when it is called. I will make suggestions for these - Can you please go through the code and remove any commented out code, and either rephrase or remove any comments that are informal (e.g. "this is so ridiculous..."), since this will form part of 
highdicom's public codebase. 
@CPBridge @afshinmessiah @fedorov I would still prefer the constructor to just accept Sequence[pydicom.dataset.Dataset] and avoid introducing a new type. Conceptually, these classes convert a list (not set!) of single-frame image instances (not frames!) into a multi-frame image instance.
We designed these classes for the following use case:
datasets = [dcmread(f) for f in filenames]
image = LegacyConvertedEnhancedCTImage(datasets, ...)
print(image.SeriesInstanceUID)
print(image.pixel_array)
If one wants to write a program that needs to sort frames and figure out which single-frame instances to pass to the constructor, that's a separate problem to me. I am fine with providing FrameSet and FrameSetCollection as utilities for that purpose (although they would need to be renamed, because "frame" has a specific meaning in DICOM and is very misleading in this context).
It's true that this would mean that some checks may need to be repeated, but that is not of major concern to me. The utilities could be reused under the hood so that we wouldn't have to duplicate source code.
How intensive are these checks? If performance is indeed a bottleneck, we can re-consider and look into options how we could mitigate it.
FYI, by way of background ...
I introduced the term "FrameSet" in the IHE Basic Image Review (BIR) profile in order to treat similarly (for display purposes) a bunch of related single frame image instances and a single multi-frame image instance, so that the arbitrary difference in encoding did not manifest itself to the user of the application, as well as to deal with some specific partitioning issues (like dual-echo MR series). Nobody suggested a better term at the time, or was too concerned about this.
DICOM doesn't actually formally define a frame, but from the definition in PS3.3 of Multi-frame Image, it can probably be taken to mean a "two-dimensional pixel plane".
I grant you that in this context the "Frame" does have accompanying metadata, and is not just the array of pixel data, which one could argue is misleading, but it never occurred to me it would be misconstrued in this highly specific context of use.
So I think @afshinmessiah is using the terms "Frame" and "FrameSet" in a reasonable manner, and certainly the same way as I re-used them in PixelMed for doing set of single frame to multi-frame conversions (inspired by the BIR profile), so he inherited this (possibly poor choice of) terminology from me.
And I meant them to be a "set" not a "list" because members are not intended to be duplicated, which I believe is the distinction between the two. Ideally the code would check that there are no duplicates, if the input set is actually provided as a list.
PS. For our purposes of conversion, we further narrow what constitutes a FrameSet to have more commonality that merely being part of the same Series and indeed need not constrain Instances to be from the same Series in the general case. The BIR constraints were simplified for a specific purpose that may not always align with our legacy conversion use cases.
Thanks @dclunie for the additional information and background.
Within the highdicom library we generally refer to a "frame" as an Item of an encapsulated Pixel Data element of an Image instance. This is in line with the usage with the term in PS 3.3 (e.g., Per-Frame Functional Groups Sequence) and PS 3.5 (e.g., Transfer Syntaxes For Encapsulation of Encoded Pixel Data). We sometimes also use it more broadly to refer the decoded pixel matrix (NumPy array), i.e., the pixels of an individual image slice/plane/tile. I would avoid the term here in this context when referring to single-frame image instances.
Since the order of single-frame image instances is significant in this context, I would further avoid the term "set", which could easily be misunderstood in Python given the built-in type set.
Irrespective of the terminology, I would prefer if this logic (figuring out which frames to use, sorting them, removing duplicates, performing runtime checks, etc.) would be applied within the constructor of the LegacyConvertedEnhancedXImage classes rather than in a separate class. We can factor that logic out into separate utilities (preferably using different names as elaborated above) and provide them as part of the library (e.g., in highdicom.legacy.utils) so that they can be re-used for managing collections of single-frame image instances beyond the legacy conversion use case.
A "sorted set" is still a "set", not a "list".
Further, the order may need to be determined by sorting based on metadata (even if it is only by InstanceNumber within Series), not the the order in which the instances are supplied in the input, which may depend on how they are listed in the filesystem, which may vary between platforms.
A "sorted set" is still a "set", not a "list".
Python doesn't have an ordered set and the interface of the built-in set differs significantly from that of a sequence, for example it cannot be indexed using the [] operator and contained elements must be hashable.
I am not saying the term "set" is wrong from an abstract type theory perspective, but potentially misleading to Python developers. Therefore, I would avoid the term "set" and Union[Sequence[Dataset], FrameSet].
We could instead use "unique" to emphasize that items of the sequence are not duplicated (see also numpy.unique).
Further, the order may need to be determined by sorting based on metadata (even if it is only by InstanceNumber within Series), not the the order in which the instances are supplied in the input, which may depend on how they are listed in the filesystem, which may vary between platforms.
Python provides several mechanisms for sorting sequences (lists), for example using sorted, which accepts a function as an argument to obtain an attribute for comparing items and determining their order. This could be readily applied to Sequence[Dataset].
I would still prefer the constructor to just accept
Sequence[pydicom.dataset.Dataset]and avoid introducing a new type.
@afshinmessiah, can you implement this change, making the facilitating classes available as utility classes?
although they would need to be renamed, because "frame" has a specific meaning in DICOM and is very misleading in this context
Afshin was using those terms since he did not want to change the conventions used in Pixelmed. But I don't think this is worth an argument - can you suggest an alternative name that you like Markus, so Afshin can fix this?
Afshin was using those terms since he did not want to change the conventions used in Pixelmed. But I don't think this is worth an argument - can you suggest an alternative name that you like Markus, so Afshin can fix this?
The main logic seems to be implemented in the FrameSet._find_per_frame_and_shared_tags() method, which updates the FrameSet._shared_tags and FrameSet._perframe_tags attributes. Could we  just create a utility function (in highdicom.sr.utils) instead with the following signature?
def group_per_frame_and_shared_attributes(
    instances: Sequence[Dataset]
) -> Tuple[Set[BaseTag], Set[BaseTag]]:
    """Assign attributes to per-frame or shared functional groups.
    Parameters
    ----------
    instances: Sequence[pydicom.dataset.Dataset]
        DICOM SOP instances
    Returns
    -------
    Tuple[Set[pydicom.tag.BaseTag], Set[pydicom.tag.BaseTag]]
        Tags of attributes assigned to per-frame and shared functional groups
    """
I would prefer the use of keywords instead of tags, but I assume that this function should be capable of dealing with private attributes as well. Should it?
I would prefer the use of keywords instead of tags, but I assume that this function should be capable of dealing with private attributes as well. Should it?
It shouldn't I guess. I just preferred to have a list of tags instead of kws to speed up the code. @hackermd, just pushed the new changes to code, moving the FrameSet and FrameSetCollection to highdicom.utils :)
I assume that this function should be capable of dealing with private attributes as well. Should it?
It shouldn't I guess. I just preferred to have a list of tags instead of kws to speed up the code.
@afshinmessiah why should that function NOT be capable to operate on private attributes, can you explain?
I assume that this function should be capable of dealing with private attributes as well. Should it?
It shouldn't I guess. I just preferred to have a list of tags instead of kws to speed up the code.
@afshinmessiah why should that function NOT be capable to operate on private attributes, can you explain?
Well, let me rephrase: 'it is not necessary', according to the current logic of the code, both per-frame and shared attributes are not allowed to be private.
In that case, I would prefer the function to have the following interface:
def group_per_frame_and_shared_attributes(
    instances: Sequence[Dataset]
) -> Tuple[Set[str], Set[str]]:
    """Assign attributes to per-frame or shared functional groups.
    Parameters
    ----------
    instances: Sequence[pydicom.dataset.Dataset]
        DICOM SOP instances
    Returns
    -------
    Tuple[Set[str], Set[str]]
        Keywords of attributes assigned to per-frame and shared functional groups
    """
The function be able to replace the FrameSet class.
The main logic seems to be implemented in the
FrameSet._find_per_frame_and_shared_tags()method, which updates theFrameSet._shared_tagsandFrameSet._perframe_tagsattributes. Could we just create a utility function (inhighdicom.sr.utils) instead with the following signature?
@hackermd, I am more of a fan of objects and classes than of functions since they are easy to extend and well organized. Of course, we can rewrite any class and object in function form but I don't get this preference of function over the class. In this specific case, there are other functionalities of FrameSet class that I use throughout the conversion process. I believe this is too much of a change for this single PR. I suggest we settle for the current FrameSet and FrameSetCollection classes since they are optional and users don't have to use them. I the future versions you can refactor the structure to better deal with future usecases.
I suggest we settle for the current FrameSet and FrameSetCollection classes since they are optional and users don't have to use them. I the future versions you can refactor the structure to better deal with future use cases.
That will not be that easy if we make them part of the public API. That's why I suggested keeping them private and only using them internally in the constructors for now. We can later revisit if additional use cases come up.
@afshinmessiah I don't necessarily agree with Markus, but let's just go ahead with this suggestion - can you make those private and use them only internally in the constructors?