nibabel Tractography Data Format

It would be terrific to start a conversation about an agreed-upon data format for tractography. @arokem @Garyfallidis

Jul 30 '20 16:07 francopestilli

Hi-

I am interested in this. I have worked on the tractography tools in AFNI (with Ziad Saad).

I imagine FSL developers would be interested, as would Frank Yeh @frankyeh of DSI-Studio and the developers of MRtrix3.

Thanks, Paul Taylor

Jul 30 '20 19:07 mrneont

Yep! We should have this as an open discussion.

Jul 30 '20 19:07 francopestilli

Love to see a new format standard. TRK file has been given me a lot of headaches and limited a lot of possible extension. DSI Studio will surely support any open standard for tractography.

Jul 30 '20 19:07 frankyeh

While I don't necessarily think that a new tractography data format must be constrained by the NiBabel API, @MarcCote did put a fair bit of time and thought into the streamlines API. It might be worth thinking through whether this API is sufficient or if it's missing something.

Part of my thinking is that once a sufficient API is settled on, we can turn that around to quickly prototype more-or-less efficient implementations.

For what it's worth, I recently explored the API a bit and tried to summarize a bit in the following:

https://github.com/effigies/nh2020-nibabel/blob/ef3addf947004ca8f5610f34e767a578c4934c09/NiBabel.py#L821-L911

Jul 30 '20 19:07 effigies

I totally agree with @effigies. Adding more people to the discussion. @frheault @jchoude

Aug 04 '20 15:08 MarcCote

@MarcCote @frheault @jchoude @frankyeh great! @effigies loading (partial data loading) and efficiency will be critical for the format.

Aug 04 '20 15:08 francopestilli

Trying to include a few other potentially interested people: @bjeurissen @jdtournier @neurolabusc (Not finding an obvious FSL-tracking contact via github ID-- could someone else please help with that?)

Aug 04 '20 15:08 mrneont

@effigies -- would be great to try that API with a demo.

Some functionality we value is keeping track of tracts as bundles-- if we put in some N>2 targets, we often care about any pairwise connections amongst those as separate bundles, because we are basically using tractography to parcellate the WM skeleton. Does that labeling/identifying of groups of tracts exist there?

Aug 04 '20 15:08 mrneont

The FMRIB contacts I know are @eduff and @pauldmccarthy. They might be able to point us to the right people...

Aug 04 '20 15:08 effigies

Before creating a new format, it might be worth considering the existing formats and see if any could be enhanced or improved, similar to the way that NIfTI maintained Analyze compatibility while directly addressing the weaknesses. The popular formats seem to be:

BFloat and Bfloat.gz used by Camino.
MRI studio's DAT format.
PDB format is used by CINCH and DTI-Query.
TCK format is used by MRTrix
TRACT format used by AFNI.
TRK format is used by TrackVis as well as dsi studio's compressed TRK.GZ. A nice feature is that this format includes support for bundles as properties (e.g. distinct bundles of tracks) as well as scalars (e.g. different z-scores along the length of a single track).
Legacy VTK format (which have the filename extension .FIB or .VTK).

Unlike triangulated meshes, tractography can not benefit from indexing and stripping, so the existing formats all seem to be pretty similar (all describe node-to-node straight lines, not splines).

I concur with @mrneont that it is nice to have the ability to describe tracks as bundles. I think this makes TRK the most attractive choice (some other formats are not well documented, so they may support this feature).

Perhaps @frankyeh can expand on his frustrations with TRK. What are the limitations? Likewise, perhaps everyone (including @francopestilli who started this thread) can provide a wish list for desired features.

Aug 04 '20 16:08 neurolabusc

Hi all,

And thank you for bringing this up. I have to say this a recurrent topic. Every one or two years this re-emerges.

I suggest before you do anything else study what is already developed in the existing API in nibabel.

Marc-Alex and myself worked quite a bit to support all the basic needs. Accessing bundles fast, adding properties etc. Actually we have already implemented a fast version that can load/save tracks to npz (a numpy ) format. Which you can use if you have big data.

For me the main decision which requires feedback from the community is the formatting technology. Do you want to save the end result using json, hdf5, gITF or something else? If we can decide on that. Then we are set. The work to study previous formats is already mostly done at least on my side.

Nonetheless, see also a recent paper for a new format called Trako https://arxiv.org/pdf/2004.13630.pdf

Aug 04 '20 16:08 Garyfallidis

It is important to mention that no matter the file format, the main problems when it comes to standard will remain. Most people have a lot of trouble with TRK because you can mess up the space or the header or both. But the same remains true for vtk/tck, one could always write the TCK wrong (data) and/or provide the wrong nifti as a reference for the transformation.

No matter the new format, the same difficulties will remain. There is a thousand ways to write a TRK wrong, but many write it wrong and read it wrong too and it can work in their software. I think I was added due to my contribution to Dipy (StatefulTractogram). I think that no matter the new format, as long as people can have header attributes such as:

Space (VOX, VOXMM, RASMM)
Origin (of voxel, CORNER, CENTER)
Affine (vox2rasmm or inverse)
dimensions (shape of the diffusion from which the tractogram was computed)
Voxel size (For verification with the affine)
Voxel order (For verification with the affine)

I think I will be happy. For example, in my own code I used @MarcCote API to write an HDF5 format in which the length, offset, and data of one or multiple 'tractograms' are saved. So I can easily read any of these tractograms (I use it for connectomics, also could be used for bundles) and one could achieve the same to read any streamlines in particular. But as long as the attribute listed earlier are available anything can be done after that.

Also, if a new format is added to dipy. If it is statefulTractogram friendly it can easily be converted back to other commonly supported formats (TCK, TRK, VTK/FIB, DPY). If these efforts are made to have more efficient reading for computation, I think there is no problem with supporting more format. If the goal is to reduce confusion in how to write/read the format, I believe that a new format would not/never help. The unstructured nature of tractogram (not grid-like) makes it harder since the header and data are not fixed together when it comes to the spatial coherence.

PS : I personally think TRK is fine, everything is in the header. The problem is the variety of ways people can write/read it wrong. Making support across tools and labs pretty difficult. However, I think a strict approach to reading/write in dipy was beneficial on the long term. Short term, sure maybe half of the user probably hate me on some level, but a think a strict TRK (or at least a TCK always with the nifti that generated it) is superior than a lot of format. Just in term of available info, not for large scale computation visualisation and fancy reading/writing.

Aug 04 '20 17:08 frheault

At the danger of somewhat rerouting the conversation, I guess since this has come up, this might also be a good time to discuss whether it would be beneficial to "upstream" the StatefulTractogram that @frheault has implemented in DIPY into nibabel. It's been "incubating" in DIPY since April 2019 (https://github.com/dipy/dipy/pull/1812), with a few fixes and improvements along the way (e.g., https://github.com/dipy/dipy/pull/2013, https://github.com/dipy/dipy/pull/1997) that have followed rather closely the continuous improvements that have happened in nibabel. When Francois originally implemented this, we discussed the possibility of eventually moving the SFT object implementation here (see @effigies comment here about that: https://github.com/dipy/dipy/pull/1812#issuecomment-488664324). As someone who has been using SFT quite extensively in my own work, I find it rather helpful (where previously I was struggling with all the things that @frheault mentioned above). So I think that there could be a broader use for it. Just to echo @frheault's comment: a benefit of that would be that you could move between SFT-compliant formats with less fear of messing things up. I guess the question is one of timing and of potential future evolution of the SFT object. What are your thoughts, @frheault, (and others, of course)?

And to bring the discussion back to its start -- @francopestilli -- I am curious to hear: what needs are not currently addressed and prompted your original post here? Are you looking to store information that is not in the currently-supported formats? Or is there some limitation on performance that needs to be addressed?

Aug 05 '20 03:08 arokem

Oh - just saw the mailing list messages and now understand where this originated (here: https://mail.python.org/pipermail/neuroimaging/2020-July/002161.html, and in other parts of that thread). Sorry: hard to stay on top of everything...

Aug 05 '20 03:08 arokem

Oh - just saw the mailing list messages and now understand where this originated (here: https://mail.python.org/pipermail/neuroimaging/2020-July/002161.html, and in other parts of that thread). Sorry: hard to stay on top of everything...

The 10-million tractogram mentioned in that original thread is quite a challenge, and I am not sure if a new file format would be the answer. Emanuele already achieved 1 min loading time with TRK (pretty impressive).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

Aug 05 '20 04:08 frankyeh

I'm looping in Rob Smith (@Lestropie) into this conversation, this is something we've discussed many times in the past.

I've not had a chance to look into all the details here, but here's just a few of my unsolicited thoughts on the matter, for what they're worth:

The issue of loading a 10M streamline tractogram into memory is in my opinion independent of the file format - it's about internal in-memory data representation, and as shown by the TRK handling mentioned above, different implementations can handle the same format very differently.
Simplicity: essential it it's to be accepted as a standard. It should be relatively easy to code up import/export routines from any language, without relying on external tooling. As also mentioned by @frheault, there's lots of ways of storing these data wrong, no matter the format, so it's important to minimise any unnecessary complexities in the format, and be explicit about the conventions used. For example, the tck format used in MRtrix stores vertices in world coordinates and has only 2 required elements in its header: a datatype specifier (almost invariably Float32LE), and an offset to the start of the binary data. I can't think of anything else that would be classed as necessary here (though of course in practice there's lots of additional useful information that we want to store in the header).
Space efficiency: these are likely to contain very large amounts of data, and these should take no more space than is strictly necessary to store the information. The type of geometry and/or data layout is known in advance, so I don't think it makes sense to try to use more generic container formats like VTK or HDF5 - these will likely require more space to describe the geometry / layout. Text formats are also likely to be inefficient from that point of view.
Load/store efficiency: loading large amounts of data will take even longer if the data need to be converted, especially to/from text. Ideally it should be possible to read() / write() the data into/out of memory in one go, and even better, memory-map it and access it directly. This implies storing in IEEE floating-point format, most likely little-endian since that's the native format on all CPUs in common use. We could discuss whether to store in single or double precision, but I don't expect there will be many applications where we need to store vertex locations with 64 bits of precision - in fact, I wouldn't be surprised if the discussion goes the other way, with the possibility of using 16 or 24 bit floats instead (though these would require conversion and could potentially slow down the IO).
Independence: I think it's critical that the format is standalone, and independent of any external reference. Having to supply the coordinate system for the tractogram by way of a user-supplied image would in my opinion massively expand the scope for mistakes. I don't mind so much if the necessary information is encoded in the header, as suggested by @frheault above - but I don't see that it adds a great deal to simply storing the data in world coordinates directly. I do appreciate that it probably matches the way data are processed in many packages, where everything in performed in voxel space. In MRtrix, everything is performed in real space, and the fODFs / eigenvectors are stored relative to world coordinates also, so there's no further conversion necessary - I appreciate not all packages work that way. And for full disclosure: we (MRtrix) would have a vested interest here since storing in anything other than world coordinates would probably mean more work for our applications.
Extensibility: we routinely add lots more information in our tractogram headers as the need arises, and I expect there will be many applications where the ability to store additional information more loosely will be useful. A standard format should allow for this, and also allow for additional entries in the header to become part of the official standard if & when their use becomes commonplace.

So that's my 2 cents on what I would like a standard tracrography format to look like. You'll note that I've more or less described the tck format, and yes, there's a fairly obvious conflict of interest here... 😁

However, there's clearly features that the tck format doesn't support that others do, though I've not yet felt the need to use them. The ability to group streamlines within a tractogram is interesting. I would personally find it simpler to define a folder hierarchy to encode this type of information: it uses standard filesystem semantics, allows human-readable names for the different groups, and allows each group to have its own header entries if required. Assuming the header is relatively compact, it also shouldn't take up a lot more storage than otherwise. And it allows applications to use simple load/store routines to handle single tractogram files, and trivially build on them to handle these more complex structures as the need arises. Others may (and no doubt will) disagree with this...

Another issue that hasn't been discussed so far is the possibility of storing additional per-streamline or per-vertex information. That's not currently something that can be done with the tck format, though it may be possible with others. This is actually the main topic of conversation within the MRtrix team. We currently store this type of information using separate files, both because our tck format isn't designed to handle it (probably the main reason, to be fair), but also because it avoids needless duplication in cases where several bits of information need to be stored (this was also one of the motivations for our fixel directory format). For example, if we want to store the sampled FA value for every vertex, we could store it in the file, but what happens if we also want to sample the MD, AD, RD, fODF amplitude, etc.? It's more efficient to store just the sampled values separately alongside the tractogram, than to duplicate the entire tractogram for each measure just so it can reside in the same file. Alternatively, we could allow the format to store multiple values per vertex, but then we'd need additional complexity in the header to encode which value corresponds to what - something that's inherently handled by the filesystem if these values are stored separately. And on top of that, a format that allows for these per-streamline and/or per-vertex information would necessarily be more complex, increasing the scope for incorrect implementations, etc. Again, this is a likely to be a topic where opinions vary widely (including within our own team), but my preference here is again to keep the file format simple and uncluttered, and rely on the filesystem to store/encode additional information: it's simpler, more flexible, leverages existing tools and concepts that everyone understands, and avoids the need for additional tools to produce, inspect and manipulate these more complex datasets.

OK, that's the end of my mind dump. Sorry if it's a bit long-winded...

Aug 05 '20 11:08 jdtournier

Probably the best person to bring in regarding tractography formats from FMRIB would be Saad Jbabdi (or possibly Michiel Cottaar @MichielCottaar).

Aug 05 '20 11:08 eduff

I think @jdtournier did a nice job of describing the tradeoffs, and also agree that IEEE-754 float16 (e.g. GLhalf) probably provides more than sufficient precision, which would improve space and load/store efficiency. Thanks @Garyfallidis for noting the Trako format. It clearly achieves good space and load efficiency. It seems weak on the simplicity and store efficiency metrics - on my machine (which has a lot of Python packages installed) the example code required installing an additional 110Mb of additional Python packages, and the current JavaScript code only decodes data. So, at the moment a promising proof of concept, but not without tradeoffs. @jdtournier also makes an interesting case that perhaps scalars should be separate files (e.g. NIfTI volumes) from the tractography file).

Aug 05 '20 12:08 neurolabusc

@jdtournier the support of tck is already available in nibabel and dipy. If your claim is to just use tck then the answer is that many labs are not satisfied with the tck format. If they were fine then we would just use tck.

The effort here is to find a format that will be useful to most software tools. Nonetheless, if you look into the current implementation you will see that the tractograms are always loaded in world coordinates. But the advantage here is that you could have those stored in a different original space in the format. As about storing other metrics I think we still need that information because a) many labs use such a feature, b) if you store the data on other files then you always have to interpolate and perhaps the interpolation used is not trivial. Also you could have metrics that are not related to standard maps such as FA etc. You could have for example curvature saved for each point of the streamline. Would you prefer curvature being saved as a nifti file? That would not make sense right?

Aug 05 '20 13:08 Garyfallidis

My suggestion to move forward is that @frheault who has studied multiple file formats and found similarities and differences writes down the specifications of the new format and send it over to the different labs and tools for approval and suggestions. It is important to show that in nibabel we have done the required work to study all or at least most that is out there and that the initial effort is coming out with some consensus of some sort. I hope @frheault that you will accept to lead such a task. And also thank you for your tremendous effort to make some sense in this world of tracks. Of course we will need the help of all of us especially @effigies, @matthew-brett and @MarcCote. But I think you are the right person to finally get this done and move on happily as a community.

Aug 05 '20 13:08 Garyfallidis

@Garyfallidis, I agree with your view, that formats face a Darwinian selection, and therefore popular formats are filling a niche. However, your comment that If your claim is to just use tck then the answer is that many labs are not satisfied with the tck format. If they were fine then we would just use tck is the fallacy of the converse. Just because popular formats are useful does not mean that unpopular formats are not useful. Consider the as-yet-not-created format advocated by many in this thread: it is currently used by no one, yet that does not mean it can not fill a niche. It could be that people simply use an inferior format because their tool does not support a better format, the better format is not well documented, or they are not aware of the advantages of a better format. I think we want a discussion of the technical merits of the available formats and the desired features for a format. @jdtournier provides a nice list of metrics to select between formats.

@jdtournier the challenge I have with tck is that I can not find any documentation for it. My support for this format was based on porting the Matlab read and write routines. It is unclear if these fully exploit the format as implemented by mrview and other MRTrix tools.

Aug 05 '20 15:08 neurolabusc

@jdtournier per your comment Another issue that hasn't been discussed so far is the possibility of storing additional per-streamline or per-vertex information, in TRK-speak these are properties (per-streamline) and scalars (per-vertex). Several comments have noted this as a benefit for the TRK-format. I take your point that voxelwise images (NIfTI, MIF) can provide an alternative method to compute many per-vertex measures, but also @Garyfallidis concern that these can not encode all the measures we might want (e.g. curvature).

Maybe I am naive, but when I explored TrackVis, I thought there was a way to save TRK files that would map MNI world space without knowing the dimensions of the corresponding voxel grid:

vox_to_ras = [1 0 0 0.5; 0 1 0 0.5; 0 0 1 0.5; 0 0 0 1]
voxel_order = 'RAS'
image_orientation_patient = [1 0 0; 0 1 0]
invert_x = 0
invert_y = 0
invert_z = 0
swap_xy = 0
swap_yz = 0
swap_zx = 0

As I recall, I tried this with TrackVis with several artificial datasets that tested the alignment, and this seemed like an unambiguous way to map files nicely.

From the discussion so far, I still see TRK as the leading format available using the metrics of @jdtournier. I concur with @frheault that regardless of format, one of the core issues is explicitly defining the spatial transform.

So my question is, what problems do people have with TRK, and what new features does the field need? Can improved compliance, documentation and perhaps tweaks allow TRK to fulfill these needs?

Aug 05 '20 15:08 neurolabusc

@neurolabusc the main problem is speed for the TRK. It take long time to load/save big files. But there are also others. For example limitations on what parameters can be saved etc. @MarcCote and @frheault can you explain?

Aug 05 '20 15:08 Garyfallidis

Another issue is accessing specific parts of the file. Currently you there is no support for fast access of specific bundles or parts of the tractogram. Another issue is memory management. The trk does not have support for memory mapping or similar. Some of these files are getting too large to load fully in memory and for some applications it is better to keep them in a memory map.

Aug 05 '20 15:08 Garyfallidis

Hi Folks. I support the comments @Garyfallidis reported above. As the size of the tractography increase, we need to use a file format that allows partial loading of the data (say percentages of the streamlines).

Aug 05 '20 15:08 francopestilli

@jdtournier the support of tck is already available in nibabel and dipy. If your claim is to just use tck then the answer is that many labs are not satisfied with the tck format. If they were fine then we would just use tck.

OK, obviously my vague attempts at humour have not gone down well. The main point of my message was to provide a list of the criteria that I would consider important for a standard file format. They happen to be mostly embodied in the tck format, perhaps unsurprisingly, and I'm being upfront about the fact that this is likely to be perceived as a conflict of interest - which clearly it has anyway.

I'm not arguing that tck should become the standard, and clearly the fact that there's a discussion about this means that at least some people don't think it should be either. That's fine, but since I've been invited into the discussion, I thought I'd to state my point of view as what such a format should look like. And yes, I have a problem in articulating that without looking like I'm arguing for the tck format, precisely because the considerations that went into its original design 15 years ago are still in my opinion relevant today.

Nonetheless, if you look into the current implementation you will see that the tractograms are always loaded in world coordinates.

But that's a matter of the software implementation, not the file format, right? Perhaps I'm getting confused here, but if we're discussing a new standard file format for tractography, then it should be independent of any specific software implementation or API. This discussion is taking place on the nibabel repo, which is perhaps why we're getting our wires mixed up. I don't wish to belittle the massive efforts that have gone into this project, but I'd understood this discussion to be project-independent.

But the advantage here is that you could have those stored in a different original space in the format.

I understand that, and I can see the appeal. I can also see the overhead this imposes on implementations to support multiple ways of storing otherwise equivalent information. This is why I would argue, on the grounds of simplicity, that a standard file format should adopt a single, standard coordinate system. Otherwise we'll most likely end up with fragmentation in what the different packages support: some will only handle one type of coordinate system because they haven't been updated to support the others, and will hence produce files that other packages won't be able to handle because they only support a different coordinate system. We could of course mandate that to be compliant, implementations should support all allowed coordinate systems, but I don't think this is necessarily how things would work out in practice. And we can provide tools to handle conversions between these so that these different tools can interoperate regardless, but I'm not sure this would be a massive step forward compared to the current situation.

On the other hand, I appreciate that different projects use different coordinate systems internally, and that therefore picking any one coordinate system as the standard will necessarily place some projects at a disadvantage. I don't see a way around this, other than by your suggestion of allowing the coordinate system to be specified within the format. I don't like the idea, because this means we'd effectively be specifying multiple formats, albeit within the same container. But maybe there is no other way around this.

As about storing other metrics I think we still need that information because a) many labs use such a feature, b) if you store the data on other files then you always have to interpolate and perhaps the interpolation used is not trivial.

OK, there's a misunderstanding here as to what I was talking about. First off, no argument: the many labs that need these features include ours, and we routinely make use of such information. But we don't store it as regular 3D images, that would make no sense in anything but the simplest cases. It wouldn't be appropriate for fODF amplitude, or for any other directional measure, or curvature, as you mention.

What I'm suggesting is that the information is stored as separate files that simply contain the associated per-vertex values, with one-to-one correspondence with the vertices in the main tractography file, in the same order. This is what we refer to in MRtrix as a 'track scalar file' - essentially just a long list of numbers, with the same number of entries as there are streamline vertices. We routinely use them to encode per-vertex p-value, effect size, t-value, etc. when displaying the results of our group-wise analyses, for example.

We also use separate files for per-streamline values (used extensively to store the weights for SIFT2), and these are also just a long list of numbers, one per streamline, in the same order as stored in the main file - and in this case, stored simply as ASCII text.

I'm not sure the specific format we've adopted to store these values is necessarily right or optimal in any sense, I'm only talking about the principle of storing these associated data in separate files, for the reasons I've outlined in my previous post: in my opinion, it's more space-efficient, more transparent, and more flexible than trying to store everything in one file.

I should add that storing the data this way does introduce other limitations, notably if the main tractography files need to be edited in some way (e.g to extract tracts of interest from a whole-brain tractogram). This then requires special handling to ensure all the relevant associated files are kept consistent with the newly-produced tractography file. This type of consistency issue is a common problem when storing data across separate files, and I'm not sure I've got a good answer here.

In any case, I've set out my point of view, and I look forward to hearing other opinions on the matter.

Aug 05 '20 15:08 jdtournier

@neurolabusc I think the problem of TRK is related to efficiency when it comes to large datasets. Selecting only a subset is not optimal (especially if you want a random one), reading is slow, controlling the size of the float (float16/float32/float64) is not possible. When doing connectomics it is impossible to have a hierarchical file that allows the saving of streamlines connecting pair of regions (same logic for bundles), and a personal one is that the header is too complex for most people.

@Garyfallidis Despite all the flaws of tck/trk/vtk people have been using it for more than a decade, I think a first iteration should be as simple as possible. A hierarchical hdf5, readable by chunk using data/offset/length (you read offset/length and then know what data to read, then you reconstruct streamlines as polyline), that can append/delete data in-place, support data_per_streamline and data_per_point (and data_per_group if it is a hierarchical hdf5) with a statefulTractogram compliant header, with a strict saving/loading routine to prevent error.

@jdtournier I don't know if you are familiar with the data/offset/length approach in the ArraySequence of Nibabel. But it is a very simple way to store streamlines in 3 arrays of shape NBR_POINTx3, NBR_STREAMLINE, NBR_STREAMLINE, which I have used in the past with memmap and hdf5 to read quickly specific chunks or do multiprocessing with shared memory. Reconstructing it into streamlines is efficient since the point data is mostly contiguous (depending on the chunk size)

Bonus, I think hdf5 can be specified its own datatype for the array, so using float16 could be achieved and so reducing the size of file. Also matlab and c++ have great hdf5 libraries to help with the reading.

Finally, I agree that storing metrics per point would make an even bigger tractogram, but allowing the use of data per point and per streamline will likely facilitate the live of a few while the other can simply do it their way. I also agree that the way data should be written on disk should be world space (rasmm) as tck, that should be the default. But have the info to convert to tck/trk easily an so on. Leaving compatibility intact for a lot of people.

Your list Simplicity, Space efficiency, Load/store efficiency, Independence, Extensibility is crucial to think about. I think the header would be a much more simple than trk, but slightly more info than tck. I would go for the 4-5 attributes I mentioned earlier, that would be a sweet spot between Simplicity and Independence. As for Extensibility, since hdf5 is basically like a gigantic hierarchical dictionary as long as the mandatory keys are there. Adding more data could be done easily, more header info or even keeping track of processing would be possible (if wanted) like in .mnc file.

However, except for the switch to float16, I think reading/writing is kind of bound to its current limit. Supporting chunk read/write or on-the-fly is nice, but that would not change the speed of writing/reading of a whole tractogram.

Aug 05 '20 15:08 frheault

@jdtournier the challenge I have with tck is that I can not find any documentation for it.

That's unfortunate, it's documented here. If you'd already come across this page but found it insufficient, please let me know what needs fixing!

Aug 05 '20 15:08 jdtournier

@Garyfallidis I agree TRK is inherently slow to read. Interleaving the integer "Number of points in this track" in the array of float vertices is a poor design. Much better performance could be achieved if the integers were stored sequentially in one array and the vertices were stored in their own array. One could load the vertices directly to a VBO. This would also allow fast traversal of the file, addressing your second criticism. Both would improve the Load efficiency metric.

@frheault adding support for float16 would improve Space efficiency and Load efficiency metrics. Not sure the use case for float64 for vertices, but would be nice for scalars. I also take your point that the header and spatial transforms could be simplified, improving the Simplicity metric.

While hdf5 has some nice wrappers for some languages, the format itself rates very poorly on the Simplicity metric. I think there are clear criticisms of the complexity. This would introduce the some of the same complications as TRAKO, without the space efficiency benefits of TRAKO. It is odd to criticise the TRK format as complex when it is simply described on a short web page, and then advocate the HDF5 format.

@jdtournier thanks for the Documentation. So the Matlab read/write reveal the full capability. TCK is a minimalistic, elegant format that certainly hits your Simplicity metric, but I can see why some users feel it is too limited for their uses.

This sounds like real progress is being made on the features that are desired, and worth the cost of implementing a new format.

Aug 05 '20 16:08 neurolabusc

One additional comment here @MarcCote , @frheault and @jchoude

what is the status of this work? https://www.sciencedirect.com/science/article/abs/pii/S1053811914010635

Should we discuss that here?

Aug 05 '20 16:08 francopestilli

nibabel nibabel copied to clipboard

Tractography Data Format

nibabel
nibabel copied to clipboard