Intro-to-scRNAseq icon indicating copy to clipboard operation
Intro-to-scRNAseq copied to clipboard

many to many warning when assigning annotations to markers in scRNAseq

Open hwick opened this issue 2 years ago • 1 comments

In https://hbctraining.github.io/scRNA-seq_online/lessons/09_merged_SC_marker_identification.html adding annotations in the loop using the following method results in a many-to-many warning (when using methods to download annotations using annotationHub). This is because unique(annotations[, c("gene_name", "description")]) only finds unique gene_name/description combinations, but there are often gene repeats where genes with the same gene_name have slightly different descriptions due to being from different seq_names (such as being from a patch). For me, the many-to-many warning does not pop up when in the loop below but pops up if assigning to a single object, like the results of running FindAllMarkers. Despite the warning not popping up in the loop, I found instances of repeat gene names in my FindConservedMarkers output, with the same p values, fold changes, etc (this is what tipped me off to investigate this problem).

I downloaded annotations from annotationHub using methods outlined in another lesson section, rather than the annotations.csv example file. I'm not sure if this issue exists in the annotations.csv file, but regardless it creates some issue applying the lesson to real world analysis.

get_conserved <- function(cluster){
  FindConservedMarkers(seurat_integrated,
                       ident.1 = cluster,
                       grouping.var = "sample",
                       only.pos = TRUE) %>%
    rownames_to_column(var = "gene") %>%
    left_join(y = unique(annotations[, c("gene_name", "description")]),
               by = c("gene" = "gene_name")) %>%
    cbind(cluster_id = cluster, .)
  }

> dim(annotations) #full size
[1] 71440     5
> dim( unique(annotations[, c("gene_name", "description")])) #just unique gene_name/description combos, as used in the above loops
[1] 47618     2
> length(unique(annotations$gene_name)) #just unique gene_names
[1] 40902

hwick avatar Dec 15 '23 20:12 hwick

assigning @hwick since we might need to discuss what the best alternative is

mistrm82 avatar Feb 04 '25 03:02 mistrm82