[chatgpt] change critic input as state
Discussed in https://github.com/hpcaitech/ColossalAI/discussions/3031
Originally posted by wenjunyang March 7, 2023
i have a doubt about class Critic in applications/ChatGPT/chatgpt/nn/critic.py. in common A2C RL algorithm,Critic is the function of current state $s_i$,Critic should be independent with action $a_i$. but both in applications/ChatGPT/chatgpt/experience_maker/naive.py and applications/ChatGPT/chatgpt/trainer/ppo.py , action_mask are passed. that means Critic depend on $s_i$ and $a_i$, is it reasonable?

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑🤝🧑👫🧑🏿🤝🧑🏻👩🏾🤝👨🏿👬🏿
Title: [chatgpt] change critic input as state
We have updated a lot. Please check the latest code. This issue was closed due to inactivity. Thanks.