David Kelley
David Kelley
Yes, both the Enformer and Basenji were trained on sequences and functional annotations from hg38, but used to score variants from hg19. Once the model is trained, you can predict...
This changed a bit between the tf1 and tf2 versions, and I know you're moving to tf2 from the other thread, so I'll hold off to give specific advice. Generally,...
Hi Goutham, peak prediction is tough due to imbalance, and AUPRC will reflect that. Are those the only three datasets that you're training on? In that case, the negatives for...
Not easily. I just haven't been working in that sort of setup for awhile. If you add a bunch of ENCODE BEDs, then the negatives from your other targets will...
Hi, I typically train multi-task. You can use basenji_data.py to create tfrecords for your data from bigwig files.
Yup, that’s a good strategy
I'm not sure I understand your question. Are you asking where we obtain SNPs of interest? Typically, they are derived from a genome-wide association study.
The SNPs that overlap open chromatin regions are far more likely to be influential, so that’s a reasonable filter. However, sometimes variants outside of the peaks can create a new...
Yes, it's reasonable
To your first question, 'mean' and 'sum' refer to whether to take the mean or the sum of the nucleotide-annotated coverage in your BigWig files. If the values represent counts,...