nebuly icon indicating copy to clipboard operation
nebuly copied to clipboard

Improve Documentation and Sample Code

Open ylassoued opened this issue 1 year ago • 2 comments

Thank you very much for making this public. I have been struggling to understand how to fine tune ChatLLaMA as per the provided code snippet in the Readme file. It would be much appreciated if you could clarify the following points in the Readme file.

  1. Are all the training steps (reward, actor and RL training) required (in the order provided in the code snippet) to train ChatLLaMA, or is it sufficient to train the actor only (ActorTrainer)?
  2. The training dataset structures (JSON files) are not clear. In ActorDataset for example, it is mentioned that "completion" was "the output of the user". What does this mean exactly? Does it mean the expected answer to "user_input"? Same for the other training datasets. Examples and more elaborate documentation would be much appreciated.
  3. How to load the ChatLLaMA model weights (from file/directory) for training? Any chance that you could add this to the training code snippet.
  4. It is mentioned in the Readme file that "alternatively, you can generate your own dataset using LangChain's agents". Is this an alternative to the custom dataset? I tried to run generate_dataset.py, but it did not produce any output file, no errors either.

ylassoued avatar Mar 02 '23 09:03 ylassoued

Hello @ylassoued, thank you very much for reaching out.

Re 1: Yes all the steps are required since they reflect the three training steps used for ChatGPT (InstructGPT-like models).

  • The ActorTrainer fine-tunes the model on your data in a supervised way;
  • The RewardTrainer trains the reward model in order to simulate the human feedback;
  • The RLTrainer performs RLHF on the LLaMA model

Re 2: Absolutely, you are right. We are currently working on it in order to refine the code and provide more examples;

Re 3: Sure, we are going to provide a config file ASAP;

Re 4: Yes, it is an alternative to the custom dataset. Thanks for pointing out the issue, I'll investigate the problem.

diegofiori avatar Mar 02 '23 09:03 diegofiori

Thank you very much @diegofiori for your prompt and positive reply! Much appreciated! Looking forward to the next "version" :-).

ylassoued avatar Mar 02 '23 10:03 ylassoued

Hi @ylassoued, examples of the config file and of dataset generation have been added to the new readme with #203 and #204. Feel free to open another issue if you have any suggestion for improving the docs or for getting further information about chatllama 😄

diegofiori avatar Mar 10 '23 16:03 diegofiori

This brilliant!!!! Thank you very much @diegofiori!!! Looks great! I will try give it another try :-) Thank you very much!

ylassoued avatar Mar 11 '23 10:03 ylassoued