Charles Srisuwananukorn

Results 56 comments of Charles Srisuwananukorn

As I mentioned in the PR, both the pretrained model and datasets can be quite large. ``` $ du -sh data/* pretrained/GPT-NeoX-20B/ 172G data/OIG 238M data/OIG-moderation 38G data/wikipedia-3sentence-level-retrieval-index 39G pretrained/GPT-NeoX-20B/...

If you're trying to reproduce the `GPT-NeoXT-Chat-Base-20B` model, you can download the dataset by running `python data/OIG/prepare.py` from the root of the repository. We plan to add more documentation about...

To reproduce `GPT-NeoXT-Chat-Base-20B` requires quite a lot of resources. 1. You'll need around 1TB of disk space. The datasets take about 200GB. ``` $ du -hs data/* 172G data/OIG 238M...

> Would be nice if the README can add the prerequisites for setting up the environment. I'll update the README. I believe these packages are only available on Linux. Windows...

Or it could also be your git configuration. Could you let me know if this command works (as @orangetin suggested)? ``` git clone https://huggingface.co/datasets/laion/OIG /www/wwwroot/OpenChatKit/data/OIG/files ```

Can you tell me more about what this patch does?

Thanks for the PR! I'll be taking a look at this soon.

Thanks! Will look.

Thanks, @LorrinWWW. Let's add this to the training README?

> It'd be really cool if the minimum requirements of the model (size on disk for data set, vram requirements) on the readme, that would save a lot of people...