bigbird icon indicating copy to clipboard operation
bigbird copied to clipboard

Pre-trained model for genomic sequences

Open ptynecki opened this issue 3 years ago • 9 comments

Good morning,

Thank you for sharing the paper, code and pre-trained model for NLP text data. Your research work results are impressive. Because I am developing embeddings solutions for genes and proteins, the application to genomic sequences part interests me the most.

Is there any chance to try BigBird nucleotide-based pre-trained model for research purpose? I would like to include it in my benchmark and compare it with existing non-contextual embeddings (Word2Vec, FastText and Glove).

Regards, Piotr

ptynecki avatar Dec 14 '20 06:12 ptynecki

Hi Piotr,

Thanks for interest in our work. We are working on releasing the model pretrained on DNA fragments.

Thanks!

manzilz avatar Dec 15 '20 06:12 manzilz

Might we get the code for genome pretraining, as well as the pretrained network weights themselves please?

project-delphi avatar Apr 12 '21 05:04 project-delphi

Hi, any update on this?

jonas27 avatar Jun 01 '21 07:06 jonas27

Greetings, manzilz

I also work on nucleotide-based language models and would appreciate if you could release a pretrained-model, for me to use as a benchmark

Thanks a lot!

imanmal1k avatar Jan 14 '22 10:01 imanmal1k

Hi,

Any update about the release ?

FAhtisham avatar Jul 19 '22 23:07 FAhtisham

@manzilz any updates on the release? 😃

ItamarChinn avatar Jan 24 '23 23:01 ItamarChinn

Any update on this? This would be very useful for embedding dna/rna sequences

cbirchsy avatar Feb 26 '23 21:02 cbirchsy

I'm absolutely sure that they DON'T HAVE ANY PLANS about releasing DNA models.

bbpxq avatar Mar 07 '23 11:03 bbpxq

We have replicated BigBird pre-training on more recent T2T human genome assembly. The model is available via HuggingFace: https://huggingface.co/AIRI-Institute/gena-lm-bigbird-base-t2t. Any kind of feedback is welcome!

yurakuratov avatar Apr 05 '23 11:04 yurakuratov