PaLM-rlhf-pytorch
PaLM-rlhf-pytorch copied to clipboard
Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM
The follow-up research from PaLM switched in Flan-PaLM to the encoder-decoder t5 architecture. How would it be possible to also add an encoder to this implementation?
Hi, first of all thanks for your work. I will definitely give it a try. I was wondering if you could share some information about the training time and which...
I find the reward function to be the most important part of RLHF, because it is the part which mimics a human evaluator, providing instant feedback to the model. However,...
Logs train and val loss as well as generated texts by default only when wandb available
huggingface -> Hugging Face
Is it possible to release a code based on jax?
hi, i work at a company that wants to help. We've computational power and we would like to talk more about it, is it possible?
Hi, I've been planning to train this model, I have a tpu pod(v3-128) through trc, which should equate to ~ 5 tb of ram and 2 tb of vram, I...
Hi, Any references to train this on my own data ?