tfmodisco-lite copied to clipboard
Compatibility with bpnet-refactor
Hello, I have recently run the bpnet-refactor workflow and obtained SHAP attribution scores, which I would now like to load into TF-MoDISco. I'm having some trouble seeing how to do that - the bpnet-refactor documentation says you can do it with tfmodisco-lite but otherwise is a bit spare on the subject. The final bpnet command was:
$ bpnet-shap \
--reference-genome $REFERENCE_GENOME \
--model $MODEL_DIR/model_split000 \
--bed-file $DATA_DIR/peaks_inliers.bed \
--chroms chr1 \
--output-dir $SHAP_DIR \
--input-seq-len 2114 \
--control-len 1000 \
--task-id 0 \
--input-data $INPUT_DATA \
--generate-shap-bigWigs \
--chrom-sizes "$CHROM_SIZES"
The output of that was:
$ ls $SHAP_DIR
config.json counts_scores.stats.txt profile_scores.h5 peaks_valid_scores.bed profile_scores.stats.txt
counts_scores.h5 shap_scores.log
Now, my question is, how is one meant to convert the *.h5 files with the scores into the *.npz files expected by TF-MoDISco? I see from the Jupyter notebook that you use bpnet-lite, which has an 'interpet' function to make the one-hot sequence encoding and attribution scores, but bpnet-refactor seems not to have the same program, or anything named similarly, in its /bin. Is there some way to take what I've already got and render it useable for TF-MoDISco?
Thanks, Greg
Edited: realized that bpnet-lite is indeed on GitHub, just not in Kundaje lab's repositories.