bigbird
bigbird copied to clipboard
Pre-trained model for genomic sequences
Good morning,
Thank you for sharing the paper, code and pre-trained model for NLP text data. Your research work results are impressive. Because I am developing embeddings solutions for genes and proteins, the application to genomic sequences part interests me the most.
Is there any chance to try BigBird nucleotide-based pre-trained model for research purpose? I would like to include it in my benchmark and compare it with existing non-contextual embeddings (Word2Vec, FastText and Glove).
Regards, Piotr
Hi Piotr,
Thanks for interest in our work. We are working on releasing the model pretrained on DNA fragments.
Thanks!
Might we get the code for genome pretraining, as well as the pretrained network weights themselves please?
Hi, any update on this?
Greetings, manzilz
I also work on nucleotide-based language models and would appreciate if you could release a pretrained-model, for me to use as a benchmark
Thanks a lot!
Hi,
Any update about the release ?
@manzilz any updates on the release? 😃
Any update on this? This would be very useful for embedding dna/rna sequences
I'm absolutely sure that they DON'T HAVE ANY PLANS about releasing DNA models.
We have replicated BigBird pre-training on more recent T2T human genome assembly. The model is available via HuggingFace: https://huggingface.co/AIRI-Institute/gena-lm-bigbird-base-t2t. Any kind of feedback is welcome!