GENA_LM
GENA_LM copied to clipboard
GENA-LM is a transformer masked language model trained on human DNA sequence.
Hi! Thanks so much for sharing this repo. I have been following the GENA_LM project with interest and appreciate your provision of the code for training the Tokenizer. I am...
Hello, I have been trying to train a tokenizer following the code you provided, but I am encountering an out-of-memory issue. I'm working with a multi-species dataset that's several tens...
Hi again, I'd love to try out the recurrent memory transformer you use in your paper - do you have any plans of making the code public/upload it to huggingface?...
Hi, thank you for this excellent repository! I'm trying to set-up `gena-lm-bigbird-base-sparse-t2t`, but I can't install the requisite libraries in the way suggested on huggingface: ``` pip install triton==1.0.0 DS_BUILD_SPARSE_ATTN=1...
Hi! Thanks for your great work! I want to ask have you compared the model performance in the benchmark of DNABERT2? https://github.com/Zhihan1996/DNABERT_2 Some tasks might be like, but I am...
Hi Yuri and Veniamin. Thanks for sharing your work, I very much enjoyed reading about how you were able to extend context length much further than what can be typically...
Hi! Thanks so much for sharing this repo, it's great and DNA sequence beginner friendly! I was wondering if you're able to share the training and tokenization code for pretraining?...
I am intereseted in the downstream analysis of APARENT. From the fine tuning command run_aparent_finetuning.sh, it requires other datasets. Could you provide the access to these datasets? ! CUDA_VISIBLE_DEVICES=0 horovodrun...
Hi, thanks for your great work. I am particually interested in your species classificaiton performance. According to the figure 4a, it seems that different species are well-seperated. However, based on...