Charles Srisuwananukorn
Charles Srisuwananukorn
As I mentioned in the PR, both the pretrained model and datasets can be quite large. ``` $ du -sh data/* pretrained/GPT-NeoX-20B/ 172G data/OIG 238M data/OIG-moderation 38G data/wikipedia-3sentence-level-retrieval-index 39G pretrained/GPT-NeoX-20B/...
If you're trying to reproduce the `GPT-NeoXT-Chat-Base-20B` model, you can download the dataset by running `python data/OIG/prepare.py` from the root of the repository. We plan to add more documentation about...
To reproduce `GPT-NeoXT-Chat-Base-20B` requires quite a lot of resources. 1. You'll need around 1TB of disk space. The datasets take about 200GB. ``` $ du -hs data/* 172G data/OIG 238M...
> Would be nice if the README can add the prerequisites for setting up the environment. I'll update the README. I believe these packages are only available on Linux. Windows...
Or it could also be your git configuration. Could you let me know if this command works (as @orangetin suggested)? ``` git clone https://huggingface.co/datasets/laion/OIG /www/wwwroot/OpenChatKit/data/OIG/files ```
Can you tell me more about what this patch does?
Thanks for the PR! I'll be taking a look at this soon.
Thanks! Will look.
Thanks, @LorrinWWW. Let's add this to the training README?
> It'd be really cool if the minimum requirements of the model (size on disk for data set, vram requirements) on the readme, that would save a lot of people...