David Kelley comments

Results 161 comments of


                                            David Kelley

which dbSNP build?

Yes, both the Enformer and Basenji were trained on sequences and functional annotations from hg38, but used to score variants from hg19. Once the model is trained, you can predict...

num_targets in params file vs --ti option in basenji_test.py

This changed a bit between the tf1 and tf2 versions, and I know you're moving to tf2 from the other thread, so I'll hold off to give specific advice. Generally,...

Basset style predictions - parameter tuning.

Hi Goutham, peak prediction is tough due to imbalance, and AUPRC will reflect that. Are those the only three datasets that you're training on? In that case, the negatives for...

Basset style predictions - parameter tuning.

Not easily. I just haven't been working in that sort of setup for awhile. If you add a bunch of ENCODE BEDs, then the negatives from your other targets will...

about the data used for training

Hi, I typically train multi-task. You can use basenji_data.py to create tfrecords for your data from bigwig files.

about the data used for training

Yup, that’s a good strategy

about the data used for training

I'm not sure I understand your question. Are you asking where we obtain SNPs of interest? Typically, they are derived from a genome-wide association study.

about the data used for training

The SNPs that overlap open chromatin regions are far more likely to be influential, so that’s a reasonable filter. However, sometimes variants outside of the peaks can create a new...

about the data used for training

Yes, it's reasonable

about the data used for training

To your first question, 'mean' and 'sum' refer to whether to take the mean or the sum of the nucleotide-annotated coverage in your BigWig files. If the values represent counts,...