bpnet-lite Question: does BPNet code support any seq length?

trafficstars

Hi @jmschrei

does BPNet code support arbitrary sequence length? https://github.com/jmschrei/bpnet-lite/blob/master/bpnetlite/bpnet.py

Aug 08 '23 00:08 vitkl

The BPNet model itself should work on any length. Pay attention to the trimming parameter, which specifies the difference between the input length and the output length (or, specifically, half that difference because it's the amount to trim off either side). However, I may have gotten lazy in other parts of the code-base and assumed 2114 input or 1000 output. Let me know if that's the case anywhere and I'll fix it.

Aug 08 '23 00:08 jmschrei

Thanks for clarifying!

What are the requirements for the input sequence length? Can I change trimming to use 1000 input and 1000 output?

Aug 08 '23 00:08 vitkl

If you set trimming to 0 you should be able to use the same input and output length. However, you'll likely get worse predictions on the flanks because they can't see their full contexts.

Aug 08 '23 01:08 jmschrei

I would like to use this as a bias model - how much context does the bias model need to see?

Aug 08 '23 05:08 vitkl

Usually, the bias model is given far less context so that it does not inadvertently learn complex rules. We usually use a model with four layers, so the residual layers aggregate 2**4 nucleotides after the first layer.

Aug 08 '23 05:08 jmschrei

Thanks for clarifying! Trimming 16/2 nucleotides probably doesn't matter too much.

What are the bias model first layer filter size and FCLayers dimensionality?

Aug 08 '23 06:08 vitkl

Look through this example specification: https://github.com/jmschrei/bpnet-lite/blob/master/example_jsons/chrombpnet_pipeline_example.json

Aug 08 '23 16:08 jmschrei

I am not sure where I should looks for a full list of bias model parameters. I see this https://github.com/jmschrei/bpnet-lite/blob/master/example_jsons/chrombpnet_pipeline_example.json#L34-L41 but it only mentions the number of layers.

Aug 09 '23 16:08 vitkl

Also interesting to see that BPNet doesn't use any normalisation layers (eg LayerNorm/BatchNorm). I wonder if not using those normalisation layers is a pre-requisite for learning TF concentration-dependent effects.

Aug 09 '23 17:08 vitkl

The bias model is just a BPNet model so you can use any of the parameters in https://github.com/jmschrei/bpnet-lite/blob/master/example_jsons/bpnet_fit_example.json

I'm not sure how normalization would relate to learning TF concentration-dependent effects. Presumably, that's static in each experiment.

Aug 09 '23 22:08 jmschrei

Presumably, that's static in each experiment.

This makes sense. How do you motivate not using normalisation?

Aug 09 '23 22:08 vitkl

Presumably, the motivation is that adding it in didn't help, empirically, and keeping the model conceptually simpler can help with interpretation.

Aug 09 '23 22:08 jmschrei

Makes sense!

Looking at the file with options:

max_jitter - Which values do you recommend using? I see a pretty large value of 128 compared to Basset default of 3.
reverse_complement - do you have strong arguments against adding results of scanning forward & reverse complement in favour of random RC input?

Aug 09 '23 22:08 vitkl

I'd recommend having it be as large as possible while still capturing the peak. Basset would have benefitted from a larger jitter because it had position dependence in the model through the use of dense layers. BPNet is much more resistant to this because it only uses convolution layers.
https://www.biorxiv.org/content/10.1101/103663v1 suggests the best approach is to train the model and then, at inference time, scan both. I don't think it matters too much though.

Aug 09 '23 22:08 jmschrei

Do you mean that BPNet should in principle works as well with a smaller shift?
I saw the paper - however, I have to use the summation strategy in the rest of the model - so the question is whether the bias model should be similarly trained.

Aug 09 '23 22:08 vitkl

Many of the choices for BPNet were done on a small number of ChIP-nexus data sets. I don't think we've rigorously tested each decision by, for example, looking at performance with different jitters. I think that jittering helps BPNet but don't have more than just that intuition.
The bias model and subsequently the full ChromBPNet model should be trained by randomly RCing input. If you want to force outputs to be identical across strands then, at inference time, you run a sequence and its RC through the entire model (bias and accessibility) and aggregate.

Aug 09 '23 22:08 jmschrei

I see. I observed that 3bp jittering helps cell2state compared to no jittering. I will test 100bp jittering.
I see what you mean. I would like to use BPNet as a bias model but the biological model is very different. I am using parameter-sharing architecture in cell2state CNN because the refererence-based TF motifs will be only recognised in either RC or FW so the model has to look at both directions and aggregate. As far as I understand, random RC forces the model to learn both FW and RC forms of the TF motifs (or in this case bias motifs) - does this map to your intuition? The main question is whether it's fine to randomly RC input for the bias model but use both FW/RC for the biological model.

Aug 10 '23 15:08 vitkl

Randomly RCing encourages the model to learn motifs in both directions but there's no guarantee that it learns the same effect in both directions, even though biologically that's plausible. There have been RARE cases where it learns a motif in one direction and not in the other, but this is usually only when there are not a lot of binding sites.

I'm not sure I understand how you're training your biological model. Are you training it jointly with a frozen pre-trained bias model? I guess my feeling is that both should be trained the same way. If you end up doing parameter sharing with your model, you should train the bias model with parameter sharing. Otherwise, it might learn some weird effects. Unfortunately, parameter sharing is not implemented in bpnet-lite.

Aug 10 '23 17:08 jmschrei

I see. The model failing to learn a motif in both directions makes sense.

Also makes sense to train both models with parameter sharing. The bpnet-lite code helps a lot with understanding the architecture. I will try implementing parameter sharing but I am not sure I fully understand how to do that for layer 2+. My biological architecture uses just one CNN layer, scans forward DNA with FW filter, then with RC filter (simply swap complementary nucleotides and flip width axis), applies Exp activation to each, and then adds the result of scanning with FW/RC. Does the same operation happen in all subsequent layers?

Aug 10 '23 23:08 vitkl

If you want to do parameter sharing I'd recommend having a wrapper that takes in a normal model, runs your sequence through it, then runs the RC'd sequence through it, and then flips the output from the RC'd sequence and averages it. No need to modify the underlying model.

Jun 26 '24 15:06 jmschrei

Thanks for this suggestion @jmschrei! This certainly simplifies the implementation.

Didn't Anshul's group previously show that this kind of conjoining during training is worse than both conjoining during evaluation and a model that correctly accounts for the symmetries in every layer?

Jul 06 '24 00:07 vitkl

Actually, this won't work because for forward and RC sequences to give the same results, the model needs to represent both the forward and RC versions of each TF motif: -A> <A-

Which is fine if you train a CNN de-novo but not fine if you use TF motifs as the first layer CNN weight priors.

Jul 06 '24 00:07 vitkl

I'm closing this for now as part of a cleaning sweep, but please re-open if any issues related to the original question arise again. You can email me if you have any follow-up questions about bpnet conceptually, or open another issue if you have another issue related to the bpnet-lite package.

Feb 10 '25 09:02 jmschrei

bpnet-lite bpnet-lite copied to clipboard

Question: does BPNet code support any seq length?

bpnet-lite
bpnet-lite copied to clipboard