ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[chatgpt] change critic input as state

Open ht-zhou opened this issue 2 years ago • 1 comments

Discussed in https://github.com/hpcaitech/ColossalAI/discussions/3031

Originally posted by wenjunyang March 7, 2023 i have a doubt about class Critic in applications/ChatGPT/chatgpt/nn/critic.py. in common A2C RL algorithm,Critic is the function of current state $s_i$,Critic should be independent with action $a_i$. but both in applications/ChatGPT/chatgpt/experience_maker/naive.py and applications/ChatGPT/chatgpt/trainer/ppo.py , action_mask are passed. that means Critic depend on $s_i$ and $a_i$, is it reasonable? image

ht-zhou avatar Mar 07 '23 10:03 ht-zhou

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Title: [chatgpt] change critic input as state

Issues-translate-bot avatar Mar 07 '23 10:03 Issues-translate-bot

We have updated a lot. Please check the latest code. This issue was closed due to inactivity. Thanks.

binmakeswell avatar Apr 27 '23 10:04 binmakeswell