nanoGPT icon indicating copy to clipboard operation
nanoGPT copied to clipboard

Making nano chatgpt

Open nebyu08 opened this issue 1 year ago • 8 comments

nebyu08 avatar Jan 26 '23 15:01 nebyu08

Even though it expands beyond the scope of nanoGPT, a nanoChatGPT also came to my mind: replace web-site/blog search engine with nanoChatGPT answering questions based on information of a limited yet strong focused textset or factset.

I suggest we keep this thread/issue open so people can comment with links of "nano" ChatGPT like projects.

Spiritdude avatar Jan 30 '23 10:01 Spiritdude

chatgpt training process is publicly available, but the results are highly dependent on the fine-tuned datasets that openai, which is hard to do

I think we can try webgpt, which is a more engineered solution

https://arxiv.org/abs/2112.09332 https://openai.com/blog/webgpt/


of course these are far beyond the scope of nanoGPT/MakeMore courses but, exercises after class are really fun

zhzLuke96 avatar Feb 04 '23 22:02 zhzLuke96

Another effort at https://open-assistent.io, code at https://github.com/LAION-AI/Open-Assistant

Spiritdude avatar Feb 05 '23 06:02 Spiritdude

I saw a project about chatgpt and felt compelled to share https://github.com/hpcaitech/ColossalAI

They have made a very magical optimization of the training process. It is said that chatgpt(small) can be fine-tuned on a single card, and there is a complete training process code

zhzLuke96 avatar Feb 18 '23 06:02 zhzLuke96

I've started building a nanoChatGPT as a fork of Karpathy's chatGPT. I also introduce a new idea for training by backpropagating through the reward function using the Gumbel-Softmax trick rather than policy gradient (PPO).

It works for a basic example but is still very crude though and far from being useful at scale. Sharing here in case anyone wants to try:

https://github.com/sanjeevanahilan/nanoChatGPT

sanjeevanahilan avatar Feb 23 '23 17:02 sanjeevanahilan

https://github.com/togethercomputer/OpenChatKit

Spiritdude avatar Mar 13 '23 06:03 Spiritdude

We have released nanoT5 for pre-training and evaluating T5-style (Encoder-Decoder) models. You can use it to pre-train your own model in one day on a single GPU :).

PiotrNawrot avatar Mar 17 '23 20:03 PiotrNawrot

Huh, There's actually an issue with my project ideas name on it. I have a NanoChatGPT here It has Chat functionality, human, bot, and endOfText tokens, along with a conversational dataset and more. I'm working on crude RLHF-like functionality, and would love contributors

VatsaDev avatar Aug 23 '23 21:08 VatsaDev