yurakuratov comments

Results 14 comments of


                                            yurakuratov

Refactor Bert Classifier to output int classes instead of numpy.Int64

Yes, we might cast reports outputs to default python types in `nn_trainer`. @yoptar

How to disable the defalf print when we use the apex?

Yes, just set `verbosity=0` in amp.initialize call.

Pre-trained model for genomic sequences

We have replicated BigBird pre-training on more recent T2T human genome assembly. The model is available via HuggingFace: https://huggingface.co/AIRI-Institute/gena-lm-bigbird-base-t2t. Any kind of feedback is welcome!

Out of Memory Issue During BPE Tokenizer Training with Large Multi-Species Dataset

Hi! We also encountered OOM issue while training the tokenizer. To overcome this problem, we sampled 10 x 10^6 random subsequences from the whole dataset to train the tokenizer.

Problem with DeepSpeed for BigBird Sparse

Hi! It seems that triton 1.0.0 requires python 3.9 or lower. I am successfully running our models with python 3.8 and triton 1.0.0. Try to check your python version. Yes,...

Problem with DeepSpeed for BigBird Sparse

You can also try to using triton 1.1.1 as mentioned here: https://github.com/yurakuratov/t5-experiments#triron-111 but you will need to install deepspeed fork from this instruction.

Problem with DeepSpeed for BigBird Sparse

Could you try to install transformers==4.17.0 with `!pip install transformers==4.17.0`?

Problem with DeepSpeed for BigBird Sparse

Hi, @aaronmaiww! I have just updated readme section on requirements for sparse models: https://github.com/AIRI-Institute/GENA_LM#deepspeed-for-sparse-ops. Hope you find it useful.

Usage of sparse attention within GENA_LM

Hi! Great question! Theoretically, the number of operations for full attention would always be higher than for sparse attention, because sparse attention removes full blocks of the attention matrix from...

performance compare

Hi! We have not done comparison with DNABERT-2. Could you share more details on how do you run GENA-LM on their benchmarks? This will help us to identify the issue.