ccdv-ai comments

Results 22 comments of


                                            ccdv-ai

trafficstars

bert-large-uncased gives `(1024) must match the size of tensor b (512) at non-singleton dimension 1` error

Hi @monk1337 The loaded model has a maximum sequence length of 512 tokens. If you use: `model = BertModel.from_pretrained("bert-large-uncased", max_position_embeddings=1024)` The model won't be loaded because the loaded checkpoint also...

Index out of bounds for BART and other model architectures

Hi @Gimperion Can you share your transformers version and a snippet of code you did use?

Index out of bounds for BART and other model architectures

I think I found the problem @Gimperion Something is wrong with the model and the tokenizer. The `` token has the index 50264 while the model config states that "vocab_size":...

Training a converted model

hi @shensmobile You can train the LSG model the same way as the other models. Two ways to use it: 1. Fine-tune the base model then convert it for the...

Training a converted model

First warning is ignorable. Should work out of the box with this code: ```python from lsg_converter import LSGConverter # To convert a model model_path = "myroberta_model" # or whatever model...

Training a converted model

@shensmobile For a given token, the maximum context is equal to `3*block_size + 2*sparse_block_size*sparsity_factor`. Its better to use the same size for blocks and sparse blocks for efficiency reasons. Using...

Training a converted model

You can also try using fp16 instead of fp32. Gradient accumulation is fine. Changing the optimizer can reduce memory, SGD is lighter than Adam but convergence is slower. If you...

ColBert models

The architecture in the config is `HF_ColBERT`, [see](https://huggingface.co/AdrienB134/ColBERTv1.0-bert-based-spanish-mmarcoES/blob/main/config.json). If this is a BERT model, change: ``` "architectures": ["HF_ColBERT"] ``` to ``` "architectures": ["bert"] ``` Or try : ``` model, tokenizer...

Can lsg be inferred in the openvino framework?

Hi, I have no idea. If it can load HF models, it should be able to load lsg models.

Fail to load a tokenizer (CroissantLLM)

@danielhanchen ok this is something with sentencepiece Some models are missing the tokenizer ".model" file so the **fast** tokenizer can be loaded but not the **slow** one. And there is...