EpiNano icon indicating copy to clipboard operation
EpiNano copied to clipboard

How to consider a site as modified ?

Open rania-o opened this issue 1 year ago • 8 comments

Hello,

I have some questions about the outputs or Epinano. I used the sum-error parameter and I would like to know if in order to consider a site "modified" it has to show up in the results of the delta-sum-error method and the linear model, or just modified for one of the two is enough? Also, I have a question about the results of the linear model, I don't understand what the column "lm_Bonferroni_outlier_test" corresponds to and what is the basis for saying that it is modified or not? Still the results of the linear model, in order to consider a position "modified", do both columns "lm_Bonferroni_outlier_test" and "lm_residuals_z_scores_prediction" have to give "mod" or is one of them enough?

Thanks for your help. Rania

rania-o avatar Jul 21 '22 11:07 rania-o

Dear @rania-o sorry for the slow reply. To consider a site as modified there are different criteria offered by EpiNano, one option is using the delta-sum-error and another one is the linear model. The outputs will not be identical (but should have common sites) as they rely on different assumptions. Hope this clarifies your doubts!

enovoa avatar Sep 16 '22 23:09 enovoa

Hi @enovoa,

Thank you for your reply. I still don't understand which method I have to rely on as I've got only few positions in common between the two methods.

Also, I still have the same question about the results of the linear model, in order to consider a position "modified", do both columns "lm_Bonferroni_outlier_test" and "lm_residuals_z_scores_prediction" have to give "mod" or is one of them enough?

Thank you, Rania

rania-o avatar Sep 19 '22 13:09 rania-o

Hi @rania-o, I cannot say which of them you should use, as it depends on the specific data type, modification, etc that you are using. Detection of RNA modifications varies depending on the stoichiometry of the sites, the sequence context, the coverage, the modification type, to name a few variables. Ideally you should have sequenced some control (e.g an RNA with and without a given RNA modification) in your own dataset to be able to judge the performance of each method on your own dataset. If you haven't done so, you may wish to try out the demo data to see how it performs with some data for which the RNA mod is known. You can also download public data for which you know the ground truth (eg. you can use total RNA sequencing in WT and snoRNA KOs from Begik, Lucas et al. Nat Biotech 2021, https://www.ebi.ac.uk/ena/browser/view/PRJEB37798?show=reads). Hope that helped! For the second question, @Huanle might be able to clarify better than me. Thanks, Eva

enovoa avatar Sep 19 '22 13:09 enovoa

Hi @enovoa,

Thanks a lot for these specifications. I already used demo data and other public data to test tools, and yes we do have a control sample (IVT), but it's always hard to choose a method when it's a denovo detection. for the other question, I'll wait for @Huanle's answer.

Thanks again, Rania

rania-o avatar Sep 19 '22 14:09 rania-o

Hi @rania-o the control sample that you mention above is to be able to run your samples in pairwise manner, that is not what i am referring to. I mean an internal control, e.g. a modified and unmodified oligo for which you know the ground truth. For this, if you don't have an INTERNAL control inside your own run, you may wish to test the demo data and/or publicly available datasets with ground truth known or orthogonal data available.

enovoa avatar Sep 19 '22 14:09 enovoa

Aaah, yes I see what you mean. Indeed, we already have this type of control (oligo with two known modified positions), but unfortunately with all the tools I have tested, there is none that gives me the two positions only. Always false positives and sometimes I don't even find my two known positions. That's why with denovo detection, I was wondering if in your data you noticed that one method is more reliable than the other for viral transcripts.

rania-o avatar Sep 19 '22 15:09 rania-o

Sorry I cannot provide advice on what option(s) are best analyze your own data - as I said the performance varies depending on sequence context, modification stype, stoichiometry, etc. I would recommend testing how each method/algorithm performs best on your internal control (which you seem to have) as well as on public data that is similar to your current data, and guide your decisions based on those results.

enovoa avatar Sep 19 '22 15:09 enovoa

Yes, I understand your position, it is difficult to have constant parameters or methods for all analyses, since these will depend rather on the data types. Thank you for your time and clarification. Rania

rania-o avatar Sep 19 '22 15:09 rania-o