GENA_LM issues

Inquiry Regarding Details on Training Tokenizer: Time, Hardware, and Dataset Splitting

Hi! Thanks so much for sharing this repo. I have been following the GENA_LM project with interest and appreciate your provision of the code for training the Tokenizer. I am...

a-green-hand-jack

Out of Memory Issue During BPE Tokenizer Training with Large Multi-Species Dataset

2

Hello, I have been trying to train a tokenizer following the code you provided, but I am encountering an out-of-memory issue. I'm working with a multi-species dataset that's several tens...

luoshengtangxiademao

Access to RMT

Hi again, I'd love to try out the recurrent memory transformer you use in your paper - do you have any plans of making the code public/upload it to huggingface?...

aaronmaiww

Problem with DeepSpeed for BigBird Sparse

5

Hi, thank you for this excellent repository! I'm trying to set-up `gena-lm-bigbird-base-sparse-t2t`, but I can't install the requisite libraries in the way suggested on huggingface: ``` pip install triton==1.0.0 DS_BUILD_SPARSE_ATTN=1...

aaronmaiww

performance compare

1

Hi! Thanks for your great work! I want to ask have you compared the model performance in the benchmark of DNABERT2? https://github.com/Zhihan1996/DNABERT_2 Some tasks might be like, but I am...

wawpaopao

Usage of sparse attention within GENA_LM

2

Hi Yuri and Veniamin. Thanks for sharing your work, I very much enjoyed reading about how you were able to extend context length much further than what can be typically...

immanuelazn

Training and tokenization code

8

Hi! Thanks so much for sharing this repo, it's great and DNA sequence beginner friendly! I was wondering if you're able to share the training and tokenization code for pretraining?...

exnx

datasets for running downstream : APARENT

I am intereseted in the downstream analysis of APARENT. From the fine tuning command run_aparent_finetuning.sh, it requires other datasets. Could you provide the access to these datasets? ! CUDA_VISIBLE_DEVICES=0 horovodrun...

huwenhuo

Questions about the visualization difference between results in the manuscript and reproduced

Hi, thanks for your great work. I am particually interested in your species classificaiton performance. According to the figure 4a, it seems that different species are well-seperated. However, based on...

HelloWorldLTY

GENA_LM
GENA_LM copied to clipboard

Metadata

Inquiry Regarding Details on Training Tokenizer: Time, Hardware, and Dataset Splitting

Out of Memory Issue During BPE Tokenizer Training with Large Multi-Species Dataset

Access to RMT

Problem with DeepSpeed for BigBird Sparse

performance compare

Usage of sparse attention within GENA_LM

Training and tokenization code

datasets for running downstream : APARENT

Questions about the visualization difference between results in the manuscript and reproduced

← Metadata

Owner

Metadata

GENA_LM GENA_LM copied to clipboard

Metadata

← Metadata

Owner

Metadata

GENA_LM
GENA_LM copied to clipboard