hao cheng

Results 3 comments of hao cheng

> Hi, do you mean use the average token reward as the score instead of the last token? For here, the mean should be out of the sigmod function since...

> In our case, it is also a scalar. The vector is from batch dimension instead of seq-length dimension. Thanks for the reply. I still have confusion. I printed the...

> > > In our case, it is also a scalar. The vector is from batch dimension instead of seq-length dimension. > > > > > > Thanks for the...