SAMap icon indicating copy to clipboard operation
SAMap copied to clipboard

how sankey_plot show more than four species

Open xmChen090 opened this issue 1 year ago • 7 comments

Hi Alec,

How can I display more than four species using a sankey_plot?

Thank you for your help !

xmChen090 avatar Nov 04 '23 16:11 xmChen090

If you have a SAMAP object with four species, you should be able to just pass in a list of the four species IDs into the sankey function. Alternatively, you could try making a chord plot. Could you give me more context about what you're trying to do?

atarashansky avatar Nov 04 '23 17:11 atarashansky

If you have a SAMAP object with four species, you should be able to just pass in a list of the four species IDs into the sankey function. Alternatively, you could try making a chord plot. Could you give me more context about what you're trying to do?

sankey_plot(MappingTable, align_thr=0.12, species_order = ["gas",'tro','dst',"kdc"])

I pass in a list of the four species IDs, but only “gas” “tro” “dst” display in sankey map. Actually, I want to make a sankey map of seven species, but I failed at sm.run(pairwise=True), probably because the computer is out of memory. I successfully ran through four species, but it cannot display all in sankey_plot. Or how can I modify the parameters in sankey_plot? sankey_plot seems to show only three species at most.

xmChen090 avatar Nov 04 '23 17:11 xmChen090

Can you display screenshot MappingTable.head() and paste it here?

atarashansky avatar Nov 05 '23 16:11 atarashansky

mapping_scores_example.csv I've had this issue before - here is a minimal set of mapping table scores to reproduce, hope it helps

dnjst avatar Jan 17 '24 15:01 dnjst

I had the same issue

DiracZhu1998 avatar Jul 05 '24 16:07 DiracZhu1998

mapping_scores_example.csv I've had this issue before - here is a minimal set of mapping table scores to reproduce, hope it helps

Hi, did you solve this?

DiracZhu1998 avatar Jul 07 '24 13:07 DiracZhu1998

@dnjst @atarashansky I modified the sankey_plot function and it works, but when it comes to more than 3 species, the columns do not purely represent a single species. Some species cell types were messed with and mixed into the another species column.

As for chord plots, when it comes to several species and cell types, it's hard to read the graph if we group them based on species. It would be better to group them based on the mapping, that is homologous cell type group together.

Another way is to draw a heatmap.

import numpy as np import pandas as pd import holoviews as hv hv.extension('bokeh', logo=False) hv.output(size=100)

def sankey_plot2(M, species_order=None, align_thr=0.1, **params): """Generate a sankey plot

Parameters
----------
M: pandas.DataFrame
    Mapping table output from `get_mapping_scores` (second output).

align_thr: float, optional, default 0.1
    The alignment score threshold below which to remove cell type mappings.

species_order: list, optional, default None
    Specify the order of species (left-to-right) in the sankey plot.
    For example, `species_order=['hu','le','ms']`.

Keyword arguments
-----------------
Keyword arguments will be passed to `sankey.opts`.
"""
if species_order is not None:
    ids = np.array(species_order)
else:
    ids = np.unique([x.split('_')[0] for x in M.index])

d = M.values.copy()
d[d < align_thr] = 0
x, y = d.nonzero()
x, y = np.unique(np.sort(np.vstack((x, y)).T, axis=1), axis=0).T
values = d[x, y]
nodes = M.index.to_numpy()

node_pairs = nodes[np.vstack((x, y)).T]
sn1 = np.array([xi.split('_')[0] for xi in node_pairs[:, 0]])
sn2 = np.array([xi.split('_')[0] for xi in node_pairs[:, 1]])

filt = np.zeros_like(sn1, dtype=bool)
for i in range(len(ids) - 1):
    for j in range(i + 1, len(ids)):
        filt = np.logical_or(filt, np.logical_or(
            np.logical_and(sn1 == ids[i], sn2 == ids[j]),
            np.logical_and(sn1 == ids[j], sn2 == ids[i])
        ))

x, y, values = x[filt], y[filt], values[filt]

d = dict(zip(ids, list(np.arange(len(ids)))))
depth_map = dict(zip(nodes, [d[xi.split('_')[0]] for xi in nodes]))
data = nodes[np.vstack((x, y))].T
for i in range(data.shape[0]):
    if d[data[i, 0].split('_')[0]] > d[data[i, 1].split('_')[0]]:
        data[i, :] = data[i, ::-1]
R = pd.DataFrame(data=data, columns=['source', 'target'])
R['Value'] = values


# Adjust the order of nodes to ensure that they are placed in columns
node_sort_key = {species: i for i, species in enumerate(ids)}
R['source_order'] = R['source'].apply(lambda x: node_sort_key[x.split('_')[0]])
R['target_order'] = R['target'].apply(lambda x: node_sort_key[x.split('_')[0]])
R = R.sort_values(by=['source_order', 'target_order'])

def f(plot, element):
    plot.handles['plot'].sizing_mode = 'scale_width'
    plot.handles['plot'].x_range.start = -600
    plot.handles['plot'].x_range.end = 1500

sankey1 = hv.Sankey(R, kdims=["source", "target"], vdims="Value")

cmap = params.get('cmap', 'Colorblind')
label_position = params.get('label_position', 'right')
edge_line_width = params.get('edge_line_width', 0)
show_values = params.get('show_values', False)
node_padding = params.get('node_padding', 4)
node_alpha = params.get('node_alpha', 1)
node_width = params.get('node_width', 30)
node_sort = params.get('node_sort', True)
frame_height = params.get('frame_height', 1000)
frame_width = params.get('frame_width', 800)
bgcolor = params.get('bgcolor', 'snow')
apply_ranges = params.get('apply_ranges', True)

sankey1.opts(cmap=cmap, label_position=label_position, edge_line_width=edge_line_width, show_values=show_values,
             node_padding=node_padding, node_cmap=depth_map, node_alpha=node_alpha, node_width=node_width,
             node_sort=node_sort, frame_height=frame_height, frame_width=frame_width, bgcolor=bgcolor,
             apply_ranges=apply_ranges, hooks=[f])

return sankey1

DiracZhu1998 avatar Jul 07 '24 19:07 DiracZhu1998