gimmemotifs icon indicating copy to clipboard operation
gimmemotifs copied to clipboard

Report Metrics

Open MattGoulty opened this issue 2 years ago • 1 comments

Hello,

Your software and tutorial is very helpful and I have been able to use easily, thank you.

However, I am new to motif analysis and have been struggling to interpret the output metrics/significance scores.

I am trying to identify enriched motifs upstream of the starting codon for a set of genes but there are only approximately 100 genes in my input set.

Using gimme motifs, which report statistic is the most reliable? Is there any guidance on reasonable significance thresholds? How should I adjust my interpretation given the small sample size?

Any help is greatly appreciated. Thank you

MattGoulty avatar Apr 04 '22 09:04 MattGoulty

Hi @MattGoulty. This is a difficult question, but I'll try to see if I can give a few hints and guidelines. The gimme motifs command with the default settings is more suited for analyzing larger input date (ChIP-seq peaks for example). By default, it uses only 20% of the input sequences to predict motifs and the other 80% for validation. However, in the case of a small input, this is not really optimal. Here, you're better of increasing the input for motif prediction to at least 50%, maybe higher. You can do that with the -f parameter, for instance -f 0.5 for using 50% for prediction.

Next are the statistics. The reason we give many is that there is no straightforward single best statistic, that is why we provide several. For a nice explanation on ROC AUC and PR AUC you can have a look here. For ChIP-seq a ROC AUC for a good motif can go up to 0.95. However, for motifs enriched upstream of genes, you will not reach this. Anything higher than 0.5 is theoretically performing better than random. The other thing I would look at is the enrichment. I think a motif should be at least 2-fold enriched before I would even look into it further. The report of gimme motifs only gives motifs that pass some thresholds, so you can assume the reported motifs are at least somewhat enriched. Other than that it's all open to interpretation!

simonvh avatar May 19 '22 09:05 simonvh