DNABERT_2
DNABERT_2 copied to clipboard
[ICLR 2024] DNABERT-2: Efficient Foundation Model and Benchmark for Multi-Species Genome
Hi, When running dnabert2, I'm trying to output not only the CLS and hidden states but also the attentions. I tried: `db2_model = AutoModel.from_pretrained(dirname, output_attentions=True, trust_remote_code=True) output = db2_model(input_ids) `...
Hello, I am currently working with the zhihan1996/DNABERT-2-117M model to extract features from DNA sequences and am encountering an issue with accessing the hidden states of the model. **Issue Description**...
I do not see that you have made the GUE+ datasets available at the link for the GUE.zip download. Are they available in another location?
When calling : hidden_states = model(inputs)[0] # [1, sequence_length, 768] we receive traceback: in :1 │ │ │ │ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1501 in _call_impl │ │ │ │ 1498 │ │ if...
Hi, I try to run the command in `./DNABERT_2/finetune/scripts/run_dnabert2.sh` and get the following results. However, it cannot match the reported ones in DNA-BERT2's paper. ![Screen Shot 2024-05-07 at 10 44...
After meeting several compatibility issues while setting up the virtual environment for running the model, I retrieved the specific versions of packages and added to requirements.txt. Now inference and finetuning...
Hi, Do I use any specific pre-trained model for Splice site prediction or the normal DNABERT2 shall suffice ? Regards.
PLZ PLZ PLZ. Release the code for pretraining, I am dying for it.
By default, the tokenizer adds special tokens to the "input_ids", specifically [CLS] at the beginning and [SEP] at the end of each token array. Was DNABERT-2 trained with these tokens...
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.44 GiB. GPU 0 has a total capacity of 10.58 GiB of which 4.44 GiB is free. Including non-PyTorch memory, this process...