minbpe issues

Results 50 minbpe issues

Sort by recently updated

Create setup.py

@karpathy , Thank you so much for the implementation. I have added setup.py to facilitate installation (using pip preferably). Can we add this please?

Biswajit2902

Automating testing using github actions

Automating testing by creating a work flow and testing on windows, macos and ubuntu with python 3.9, 3.10 ,3.11 ,3.12

nobleaustine

Use `pyproject.toml`, `pdm` and `ruff` for improved reproducibility and cleaner code

@karpathy , thank you for another interesting educational project! This MR introduces `pyproject.toml` file to handle project metadata and dependencies in accordance with [PEP-621](https://peps.python.org/pep-0621/)[^1]. Using it with [`pdm`](https://pdm-project.org/)[^2] and its...

nizhib

Need train_multi.py example to show use with multiple input files

Since there are significant concerns about handling ``, there should be an example program that shows how to properly prepare input text for that case and pass to the train...

gnp

Video2Post Generation Workflow

(will add to `minbpe-doc`) ```mermaid graph TD; classDef success fill:#5CB85C,stroke:#fff,color:#fff; classDef progress fill:#428BCA,stroke:#fff,color:#fff; classDef pending fill:#F0AD4E,stroke:#fff,color:#fff; VideoScript[Video Scripts] --> OutlineGeneration[Generate Outline]; subgraph IntegrationAndPreparation_Claude VideoScript --> PromptConstruction[Construct Prompt]; OutlineGeneration --> PromptConstruction;...

xihajun

Optimizing minbpe to also support video tokenization (extract low-dimensional latent patches from video frames)

Hi Mentor Karpathy, I was wondering if minbpe can be scaled to support tokenizing video frames into embedded patches: say as proposed in SORA's technical report and VIT paper -...

Jaykef

Alternative to bpe

Maybe I am completely wrong, but to me using something like bpe to build an encoding for text feels stupid. Sure, it is a fairly easy way and it will...

marcov-dart

counting pairs is inaccurate for repeating tokens?

I just noticed that counting pairs might be slightly inaccurate for a lot of repeating tokens. For example in the sequence 1, 1, 1, 1 the pair (1, 1) gets...

JohannesVod

Handle error when running out of pairs to merge

Accidentally encountered a `ValueError: max() arg is an empty sequence` when attempting to test on a small piece of text with a (maybe) large `vocab_size`.

vinhdq842

updated self.vocab initialization and reuse self._build_vocab()

@karpathy First of all, thank you so much for sharing your knowledge. I updated the initialization of self.vocab because I don't feel we need to call self._build_vocab(). I also cleaned...

muerghq

minbpe
minbpe copied to clipboard

Metadata

Create setup.py

Automating testing using github actions

Use `pyproject.toml`, `pdm` and `ruff` for improved reproducibility and cleaner code

Need train_multi.py example to show use with multiple input files

Video2Post Generation Workflow

Optimizing minbpe to also support video tokenization (extract low-dimensional latent patches from video frames)

Alternative to bpe

counting pairs is inaccurate for repeating tokens?

Handle error when running out of pairs to merge

updated self.vocab initialization and reuse self._build_vocab()

← Metadata

Owner

Metadata

minbpe minbpe copied to clipboard

Metadata

← Metadata

Owner

Metadata

minbpe
minbpe copied to clipboard