llama2.c icon indicating copy to clipboard operation
llama2.c copied to clipboard

Suggestion: Is it possible to reorganize the file structure

Open madroidmaq opened this issue 2 years ago • 5 comments

At present, there are already a large number of source code files in the project (one screen of the browser can't finish displaying), whether it is possible to consider putting the code related to model training under a tired train folder, the reasons are as follows:

  1. The project is mainly a C reasoning engine, and other logic codes should belong to the second echelon;
  2. The code related to the training model is logically separated from the inference engine;
  3. It is friendly to beginners/people who are new to this project;

By the way, the same problem exists in the current README.md. There are different structural designs for different groups of people. For the first batch of people, they may just feel the effect of using C language for reasoning. On this basis, those who are interested will further try to train their own models.

The organization I suggest is roughly as follows:

├── assets
│   └── llama_cute.jpg
├── train
│   ├── export.py
│   ├── export_meta_llama_bin.py
│   ├── model.py
│   ├── sample.py
│   ├── test_all.py
│   ├── tinystories.py
│   ├── tokenizer.py
│   ├── train.py
│   └── train_vocab.sh
├── LICENSE
├── Makefile
├── README.md
├── build_msvc.bat
├── configurator.py
├── requirements.txt
├── run.c
├── run.ipynb
├── tokenizer.bin
├── tokenizer.model
├── win.c
└── win.h

madroidmaq avatar Aug 21 '23 14:08 madroidmaq

you're not wrong...

karpathy avatar Aug 22 '23 02:08 karpathy

Your suggestion isn't quite right either though, e.g. requirements.txt is left outside? and configurator, etc. And probably should hide away the tokenizer too

karpathy avatar Aug 22 '23 02:08 karpathy

@karpathy You are right, some adjustments are not quite there. If no one makes this tweak by next week, I'll submit a PR over the weekend.

madroidmaq avatar Aug 22 '23 17:08 madroidmaq

@karpaty, some major grouping should be done, I think. As it stands it's quite confusing.

rdentato avatar Aug 23 '23 17:08 rdentato

@karpathy I submitted a PR that needs to be merged as soon as possible if this is appropriate, otherwise there may be a lot of git conflicts.

madroidmaq avatar Aug 26 '23 15:08 madroidmaq