brainlit
brainlit copied to clipboard
Cannot understand how `swc` files are pre-processed for upload on S3
There are segments stored on "s3://open-neurodata/brainlit/brain1_segments" that have multiple children for node -1.
Specifically:
- for
seg_id=124
,sample,structure,x,y,z,r,parent
0,1,0,1127638.0,3836589.5,4461103.0,1.0,-1
...
2522,2523,0,3941052.25,4584538.5,7404572.0,1.0,-1
...
- for
seg_id=267
,sample,structure,x,y,z,r,parent
0,1,0,2688458.5,3954619.0,6391072.0,1.0,-1
...
5196,5197,0,3117005.25,4071525.25,6529021.0,1.0,-1
...
5394,5395,0,3479036.5,3077308.25,6823521.0,1.0,-1
...
- for
seg_id=278
,sample,structure,x,y,z,r,parent
0,1,0,3302810.0,3803062.25,9353845.0,1.0,-1
...
719,720,0,5414411.5,3234410.0,7036939.0,1.0,-1
...
@tathey1 provided the folder with the original swc files, and there are fundamental differences with the way graphs are represented. Running this code snippet
df_neuron = swc.read_s3("s3://open-neurodata/brainlit/brain1_segments", seg_id=2, mip=0)
df_swc_offset_neuron, _, _, _ = swc.read_swc_offset("<path_to_swc_folder>/2018-08-01_G-002_consensus.swc")
samples = df_neuron["sample"].to_numpy()
parents = df_neuron["parent"].to_numpy()
edges = np.array([samples, parents]).T
sorted_edges = edges[np.argsort(edges[:, 0])]
assert edges[0][0] == samples[0] and edges[0][1] == parents[0]
swc_samples = df_swc_offset_neuron["sample"].to_numpy()
swc_parents = df_swc_offset_neuron["parent"].to_numpy()
swc_edges = np.array([swc_samples, swc_parents]).T
sorted_swc_edges = swc_edges[np.argsort(swc_edges[:, 0])]
assert swc_edges[0][0] == swc_samples[0] and swc_edges[0][1] == swc_parents[0]
print("Sorted edges from S3\n", sorted_edges[:10])
print("Sorted edges from .swc\n", sorted_swc_edges[:10])
yields:
Sorted edges from S3
[[ 1 -1]
[ 2 10]
[ 3 5]
[ 4 1]
[ 5 7]
[ 6 4]
[ 7 4]
[ 8 7]
[ 9 2]
[10 3]]
Sorted edges from .swc
[[ 1 -1]
[ 2 1]
[ 3 2]
[ 4 3]
[ 5 4]
[ 6 5]
[ 7 6]
[ 8 7]
[ 9 8]
[10 9]]
@bvarjavand could you help us understand how files are pre-processed for upload on S3? My intuition is that it has to do with how branching points are represented (for example node 4 from S3). If that happens with node -1 it will break the method that we use to fit B-splines to skeletons.
Also, is the vertex order in the swcs preserved in the skeleton objects?
I see. The code you are looking for is code by cloud-volume, which is called by utils/swc which is called by utils/upload.
parent=-1 implies that a node is the root node of a tree. I think a good way of diagnosing what's doing on is to plot the swcs in question using napari, highlighting the "root" nodes. It might be that the one swc file really contains multiple trees. You can also view the image chunk overlaid on these "root" nodes to see if they all are at a soma.
@JacopoTeneggi do you think the neuroglancer skeleton and swc objects are different representations of the same thing? Or that one of those representations is broken?
I would have to check. Even with the .swc files we had to remove some duplicate nodes, so my guess is that a series of actions led to different representations, but none of them is broken.
ok thanks, no need to check, just curious