bids-specification
bids-specification copied to clipboard
[MISC] BEP003: Anatomical segmentation lookup table
Relevant to #265, but listing as a separate issue since the length of discussion may interfere with the smaller inline reviews there.
If there's any relevant prior discussions on this subject that I'm not aware of please throw me a link.
One of the strengths of BIDS is that in the absence of referring to the specification document itself, it's generally possible to infer properties of compliant data simply from the file naming convention and the contents of sidecar files; i.e. for someone not familiar with BIDS, it would not be necessary to have their data open in one window, and the specification document open in another window, in order to figure out the nature of the data they have; there's an intrinsic degree of self-evidence.
From this standpoint, consider the following table proposed as part of BEP003 (#265), which provides standardised labels for anatomical segmentation images:
| Integer value | Description | Abbreviation (label) |
| ------------- | ----------------------- | -------------------- |
| 0 | Background | BG |
| 1 | Grey Matter | GM |
| 2 | White Matter | WM |
| 3 | Cerebrospinal Fluid | CSF |
| 4 | Grey and White Matter | GWM |
| 5 | Bone | B |
| 6 | Soft Tissue | ST |
| 7 | Non-brain | NB |
| 8 | Lesion | L |
| 9 | Cortical Grey Matter | CGM |
| 10 | Subcortical Grey Matter | SCGM |
| 11 | Brainstem | BS |
| 12 | Cerebellum | CBM |
Having the integer values in this table would introduce "magic numbers" to BIDS: anyone in possession of such image data would not be capable of interpreting it correctly without explicitly referring to the BIDS specification document to identify the mapping between integer values and biological descriptions.
Obviously it is important is to have standardised abbreviations and descriptions for common biological tissue nomenclature:
-
The former ensures that 3D segmentations containing "
_label-<label>" in the filename have a predictable name regardless of the software tool used to generate them; -
The latter enables conversion between different lookup tables. This is perhaps best demonstrated in the documentation for my MRtrix3 command
labelconvert: as long as the text string corresponding to the tissue / structure is identical between two lookup table files, it is possible in a fully automated fashion to convert the integer values in an image from one lookup table to another.
Having pre-defined integer values as part of the specification is however trickier:
-
A data-driven conversion of integer values would prefer these data to be stored in a text file, such that it could be read and parsed just like any other text file lookup table. Would such a file be provided with the spec, or would it need to be explicitly generated by any software that requires this information to be contained in a file?
-
It becomes acceptable for segmentation images to not provide any sidecar lookup table, in which case this "default" lookup table would always be assumed. This may however be not ideal:
-
From the perspective of the validator, it would be preferable if any segmentation image "
REQUIRED" a corresponding lookup table. -
From the perspective of a dataset being "self-reliant" / "self-consistent" / "self-evidenced" (not quite sure what word to use here but I hope the concept I'm gesturing at is evident), rather than requiring reference to the specification to gain any understanding of the data.
-
So I would actually advocate removing the "Integer value" column from this table.
This proposal I suspect would instinctively seem self-defeating to many if I were to propose it up-front. But we shall see whether or not I've succeeded in dragging anybody over to my side.