nanocompore icon indicating copy to clipboard operation
nanocompore copied to clipboard

sharkfin plot too many data points and difficult to visualize

Open Rohit-Satyam opened this issue 1 year ago • 1 comments

Hi, I was trying to make the sharkfin plot as you discussed in another issue. However, the shape of my plot doesn't match the shape you showed in the paper and the plot looks like the one shown below. These are 35K data points or perhaps you recommend splitting the result dataframe by transcript ID (i.e. ref_id Column). This is Sars-CoV-2

## Because ggplot doesn't like NAs
df<-file[,c("ref_id","ref_kmer","GMM_logit_pvalue_context_2","Logit_LOR")] %>% tidyr::drop_na()

df$Logit_LOR<- abs(df$Logit_LOR)

df<-df[order(df$GMM_logit_pvalue_context_2, df$Logit_LOR),]

df$color<-ifelse(df$GMM_logit_pvalue_context_2 <0.05 & df$Logit_LOR > 0.5 ,"Significant","Not-significant")

df$GMM_logit_pvalue_context_2<- -log10(df$GMM_logit_pvalue_context_2)
ggplot(df, aes(x=Logit_LOR, y=GMM_logit_pvalue_context_2,color=color)) + geom_point()+theme_minimal()+xlab("Logistic regression odds ratio")+ylab( "Nanocompore p-value (-log10)")

image

Rohit-Satyam avatar Jan 25 '24 07:01 Rohit-Satyam

It does seem strange that you have a lot of sites which have significant p-values but low absolute values for the log odds ratio. Without knowing anything about your experimental design it can be challenging to give good advice on what this might mean. Maybe start be looking through the methods and supplementary information in this paper where we used Nanocompore on SARS-CoV-2 RNA?

https://doi.org/10.1016/j.omtn.2023.102052

lmulroney avatar Jan 25 '24 18:01 lmulroney