deepmind-research icon indicating copy to clipboard operation
deepmind-research copied to clipboard

Enformer protocol

Open BenxiaHu opened this issue 2 years ago • 1 comments

Hello, Enformer looks great to predict gene expression. Is there a piprline to run Enformer?

I have another question. I have many DNA sequences with different length. Now I want to know whether Enformer would be able to predict the target genes for my DNA sequences.

Best,

BenxiaHu avatar Oct 04 '21 20:10 BenxiaHu

Hi,

thanks for reaching out. Currently, there is no pipeline/script to run the Enformer. I suggest implementing the pipeline yourself by extracting the relevant code from the enformer-usage colab.

The TF-hub version expects an input of size 393,216 base pairs and can also deal with unknown nucleotides - N's (represented as [0, 0, 0, 0]. To deal with variable-length sequences, I would recommend placing the main TSS for the gene in the middle of the sequence and then pad the rest of the sequence on each side with N's to reach 393,216 base pairs (or trim if too long). Note that Enformer makes prediction for the central 114,688 bp at 128 bp resolution so the result will contain 892 spatial values. You can extract transcript expression values by extracting values at the spatial bin (out of 896) overlapping the TSS of the transcript of interest.

Best Ziga

Avsecz avatar Oct 19 '21 21:10 Avsecz