basenji icon indicating copy to clipboard operation
basenji copied to clipboard

use basenji to motif discovery - motif scanning

Open moxgreen opened this issue 2 years ago • 2 comments

Dear basenji developers, I would like to use basenji to model ChIP-seq data. In particular I would like to train a model using the ChIP-seq data for a certain transcription factor (TF). The data can be in bed format (peaks) or signal in BigWig format if more suitable for basenji. I do not pretend to have explainable models like PWM, I just would like to have a model able to predict the binding of a sequence by the transcription factor of interest.

Having the precomputed model I would like to apply it on other sequences (e.g. coming from an ATAC-seq experiment, or the entire genome) and predict if those sequences are expected to be bound by TF or not.

Is basenji suitable for this purpose? Should I use basset insthead?

I was able to apply basenji_train.py and basenji_test.py on the test data you provided. One of the difficult steps for me is to design a model.json suitable for my needs. In particular I see that some parameters in the provided models (e.g. https://github.com/calico/basenji/blob/master/testdata/params.small.hd5.txt) are clearly dependent on the input (e.g. seq_length). I'm not an expert of CNN, I would like to use a "standard" architecture but I have to at least carefully set all parameters that clearly depend on the input, to me it is not clear which are those parameters.

Thanks for any advice.

moxgreen avatar Jun 08 '22 13:06 moxgreen

Hi, I believe the basset path is more straightforward for your application. I would add your dataset to the DNase compendium, so you have a bunch of tough negative examples, too. The procedure is described here: https://github.com/calico/basenji/blob/master/manuscripts/basset/make_dataset.sh

Then you can use the parameters described here: https://github.com/calico/basenji/blob/master/manuscripts/basset/params_basset.json

Let me know if you encounter any issues!

davek44 avatar Jun 10 '22 22:06 davek44

Many thanks, I will try basset.

moxgreen avatar Jun 13 '22 12:06 moxgreen