scanpy icon indicating copy to clipboard operation
scanpy copied to clipboard

sc.tl.dendrogram doesn't use var_names

Open Fougere87 opened this issue 4 years ago • 3 comments

  • [X] I have checked that this issue has not already been reported.
  • [X] I have confirmed this bug exists on the latest version of scanpy.
  • [ ] (optional) I have confirmed this bug exists on the master branch of scanpy. Latest on pip at least scanpy-1.6.0

I'm using the sc.pl.dendrogram multiple times different lists of genes on my dataset (incrementing number of highly variable genes basically). The outputted dendrogram is alway the same (I guess it's taking into account all the genes because it's using something like 32go of ram....)

Minimal code sample (that we can copy&paste without having any data)

hvegene_sets = [sc.pp.highly_variable_genes(adata, inplace=False, subset=False, n_top_genes=nhvg)["highly_variable"] for nhvg in [500,1000,2000, 3000,4000, 5000]]

then

[sum(hvgene) for hvgene in hvegene_sets]

outputs: [499, 1000, 1999, 2999, 4000, 4999] (so i have my different genesets)

then

dendro1 = sc.tl.dendrogram(adata,                   
                   var_names=adata.var_names[hvegene_sets[1]].values, 
                   optimal_ordering=True,
                   cor_method="spearman", linkage_method="complete", inplace=False,
                   groupby="Annotation")
dendro2 = sc.tl.dendrogram(adata,                   
                   var_names=adata.var_names[hvegene_sets[5]].values, 
                   optimal_ordering=True,
                   cor_method="spearman", linkage_method="complete", inplace=False,
                   groupby="Annotation")
[dendro1[key] ==dendro2[key] for key in dendro1.keys()] 

outputs:

[array([[ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True]]),
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 array([[ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True],
        [ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True]])]

At first I was creating all dendrograms in a list comprehension and it did the same. I also directly inputted a list of my own and I obtained the same result.... I guess dendrogram don't detect the genes.

When running functions such as

## Testing with creating the dendro manually

def do_corr_mat(adata,  var_names, groupby, method = "spearman") :
    categories, obs_tidy = _prepare_dataframe(adata, var_names=var_names, groupby=groupby)
    mean_df = obs_tidy.groupby(level=0).mean()
    
    return mean_df.T.corr(method=method)

def do_dendro(corr_matrix, method="ward") :
    z_var = linkage(corr_matrix, method=linkage)
    return dendrogram(z_var, labels=mean_df.index)

Everything works fine !

Thanks by advance, C

Versions

1.6.0

Fougere87 avatar Dec 17 '20 17:12 Fougere87

I found the same problem in sc.pl.dotplot, but i found in \scanpy\plotting\_anndata.py 2236th line:

    if dendrogram_key not in adata.uns:
        from ..tools._dendrogram import dendrogram

        logg.warning(
            f"dendrogram data not found (using key={dendrogram_key}). "
            "Running `sc.tl.dendrogram` with default parameters. For fine "
            "tuning it is recommended to run `sc.tl.dendrogram` independently."
        )
        dendrogram(adata, groupby, key_added=dendrogram_key)

dendrogram is not add var_names, and i fixed it in my source code


anndata 0.7.8 scanpy 1.9.1

SunYong0821 avatar Nov 01 '22 10:11 SunYong0821

I found that scanpy always only uses all var_names if the parameter var_names is set to not None.

image

tanliwei-coder avatar Jul 31 '23 03:07 tanliwei-coder

Any update on this? I encountered the same issue

TheBorgy avatar Sep 28 '23 16:09 TheBorgy