GLM-130B
GLM-130B copied to clipboard
[Disscussion] Can we align GLM-130B to human like chatgpt?
Certainly. The alignment for GLM-130B could be important, and we are on preliminary surveying.
You could use the current glm-10b on huggingface with trl/trlx to construct a model with rlhf.
What is trl/trlx? I am very interested in this use case. Why must the 10-b parameter model be used for rlhf?
I am actively working on this task and would be very interested in further development coordination.
@smeyerhot Trl is Transformer Reinforcement Learning a library built by Huggingface for training language models with PPO. Trlx is an extension of Trl built by CarperAI. Both cover the same use-case for training models using reinforcement learning with human feedback. You can also build the same functionality with actor-critic ppo in PyTorch although it would require more extensive domain knowledge. You do not have to use glm-10b but it is publicly available on Huggingface's model hub unlike 130b which requires you to apply for access. You can use any encoder-decoder or decoder-only model. We are on an issue relating to GLM for aligning human feedback with the model which is why I suggested using the 10b parameter one.
chatgpt can generate the format text and image. this need to keep the pertaining data in original format
hi gays, I use bloom to implement ppo successfully.
But I found the Bloom model use the AutoModelForCausalLM function.
however, the glm is using the AutoModelForSeq2SeqLM function.
there is no LM in AutoModelForSeq2SeqLM model, so do u know how to correct ?