tfmodisco-lite icon indicating copy to clipboard operation
tfmodisco-lite copied to clipboard

Compatibility with bpnet-refactor

Open gregorydonahue opened this issue 9 months ago • 1 comments

Hello, I have recently run the bpnet-refactor workflow and obtained SHAP attribution scores, which I would now like to load into TF-MoDISco. I'm having some trouble seeing how to do that - the bpnet-refactor documentation says you can do it with tfmodisco-lite but otherwise is a bit spare on the subject. The final bpnet command was:

$ bpnet-shap \
    --reference-genome $REFERENCE_GENOME \
    --model $MODEL_DIR/model_split000 \
    --bed-file $DATA_DIR/peaks_inliers.bed \
    --chroms chr1 \
    --output-dir $SHAP_DIR \
    --input-seq-len 2114 \
    --control-len 1000 \
    --task-id 0 \
    --input-data $INPUT_DATA \
    --generate-shap-bigWigs \
    --chrom-sizes "$CHROM_SIZES"

The output of that was:

$ ls $SHAP_DIR
config.json       counts_scores.stats.txt  profile_scores.h5
counts_scores.bw  peaks_valid_scores.bed   profile_scores.stats.txt
counts_scores.h5  profile_scores.bw        shap_scores.log

Now, my question is, how is one meant to convert the *.h5 files with the scores into the *.npz files expected by TF-MoDISco? I see from the Jupyter notebook that you use bpnet-lite, which has an 'interpet' function to make the one-hot sequence encoding and attribution scores, but bpnet-refactor seems not to have the same program, or anything named similarly, in its /bin. Is there some way to take what I've already got and render it useable for TF-MoDISco?

Thanks, Greg

Edited: realized that bpnet-lite is indeed on GitHub, just not in Kundaje lab's repositories.

gregorydonahue avatar May 20 '24 01:05 gregorydonahue