Move creation of tracing dictionary saved to HDF5 from processing to dnatracing
Currently the dictionary of data that relates to the results of processing is cobbled together from the objects that are returned by topostats.dnatracing.trace_image() in the procesing.dnatracing() function (see here.
topostats.dnatracing.trace_image() is itself returning a dictionary, but only a subset of these are being saved to HDF5.
Is everything needed if its not being saved to HDF5?
If not then its likely that we can simplify what is returned by topostats.dnatracing.trace_image() and assign it directly to the grain_trace_data[direction] dictionary.
Line 411 would then read...
grain_trace_data[direction] = trace_image(
image=image,
grains_mask=grain_masks[direction],
filename=filename,
pixel_to_nm_scaling=pixel_to_nm_scaling,
**dnatracing_config,
)
...and topostats.dnatracing would need modifying so that only the following are returned as a dictionary...
all_ordered_traces
all_cropped_images
all_ordered_trace_heights
all_ordered_trace_cumulative_heights
all_splined_traces
The statistics component of the dictionary currently returned by trace_image() should be returned as a separate object as it is not saved to HDF5, rather it is saved to the tracing_stats[direction] dictionary for combining with the grain statistics and writing to CSV.
Want me to try and address this during the nodestats tracing merger, or keep to the small commits in which it might be best to wait until after? It seems pretty big already.
And for clarity are you expecting 1 object for each thing - i.e. return (to_be_merged_with_grainstats_df, for_hdf5_dict, etc...)?
Small atomic commits are my preference.
- Its easier to review.
- The Git history remains cleaner and easier to work out where and when changes are made and by whom.
Looking at what is done in processing.dnatracing() it constructs a dictionary of components that is saved to HDF5.
I'm proposing returning that dictionary by topostats.dnatracing.trace_image() and so yes instead of...
return {
"statistics": results,
"all_ordered_traces": ordered_traces,
"all_splined_traces": splined_traces,
"all_cropped_images": cropped_images,
"image_ordered_trace": image_trace,
"image_spline_trace": image_spline_trace,
"all_ordered_trace_heights": all_ordered_trace_heights,
"all_ordered_trace_cumulative_distances": all_ordered_trace_cumulative_distances,
}
...it would look like this...
return (results, {
"all_ordered_traces": ordered_traces,
"all_splined_traces": splined_traces,
...
})
I believe this is currently done in the new implementation (#932), returning tuples