bpnet-lite icon indicating copy to clipboard operation
bpnet-lite copied to clipboard

Calling hits with FiNeMo / marginalize?

Open gregorydonahue opened this issue 7 months ago • 2 comments

Hi Jacob,

Thanks for all your help so far - I was able to generate negatives for my ATAC-seq data using bpnet, and then ran the following successfully:

chrombpnet fit -p JSON
chrombpnet predict -p JSON
chrombpnet attribute -p JSON

This creates the expected SHAP scores and sequence encodings in attr.npz and ohe.npz, respectively. At this point, I ran TF-MoDISco on those files, successfully generating enriched seqlets / motifs and a report:

modisco motifs -s ohe.npz -a attr.npz -n 2000 -o modisco.results.h5
modisco report -i modisco.results.h5 -o report -s report

So far, so good. All that remains is to take the five enriched motifs that TF-MoDISco discovered and call hits in the original peaks. Previously, when treating chIP-seq data, I achieved that with FiNeMo. However, in that case I had passed as input to FiNeMo a set of "valid peaks" (non-negatives) which were the output of bpnet-shap (the file is called "peaks_valid_scores.bed"). You need to supply this file for FiNeMo to give you actual genome coordinates for the motif hits (you can't give it the original peak set, it's expecting a number of peaks equal to what's in the *.npz files). Additionally, bpnet-shap gives another super-useful output: a bigWig describing the enrichment for each motif, for visualization on the browser.

Is there any way to get these out of the chrombpnet pipeline? I guess from the attribute step specifically. I had thought the answer might be to run chrombpnet marginalize, but I'm not really sure what that does - also, you need to provide it one or more motifs in MEME format, and I'm not sure how to get those for the motifs that TF-MoDISco discovered. Can you shed some light on this?

Thanks, Greg

gregorydonahue avatar Jul 04 '24 22:07 gregorydonahue