hao cheng
Results
3
comments of
hao cheng
> Hi, do you mean use the average token reward as the score instead of the last token? For here, the mean should be out of the sigmod function since...
> In our case, it is also a scalar. The vector is from batch dimension instead of seq-length dimension. Thanks for the reply. I still have confusion. I printed the...
> > > In our case, it is also a scalar. The vector is from batch dimension instead of seq-length dimension. > > > > > > Thanks for the...