Dmitry Labazkin
Dmitry Labazkin
Hello @ahmedDaoudi-u, thank you for your response. But what is the purpose to use slicing by `num_tokens` if it always equals to `block_size` in this implementation (as the dimension 1...
@rasbt and @ahmedDaoudi-u, thank you for explanations, then probably I will return with additional questions while exploring Chapter 5 :)
Thank you @rasbt, but then code in the notebook in several cells like [50] also should contain stride=max_length? Now it has +1.
Hi @rasbt, I think it was the last place with such case. Also I provided the differences between function names in the initial message (create_dataloader in the book vs create_dataloader_v1...
Thanks a lot for your explanation!
Probably this [notebook](https://github.com/rasbt/LLMs-from-scratch/blob/main/ch03/01_main-chapter-code/multihead-attention.ipynb) from Chapter 3 has also `stride = max_length + 1` in the cell [1]: ```python max_length = 4 dataloader = create_dataloader(raw_text, batch_size=8, max_length=max_length, stride=5) ```
Hello @bclavie, Maybe this one is interesting: [ColBERT-XM: A Modular Multi-Vector Representation Model for Zero-Shot Multilingual Information Retrieval](https://arxiv.org/abs/2402.15059) https://github.com/ant-louis/xm-retrievers https://huggingface.co/antoinelouis/colbert-xm/