trl
trl copied to clipboard
add retrival reward idea proof of concept
Here is a working proof of concept of my idea about retrival reward, its using sentence transformer to compare generated answer and ground true answer. WB metrics https://wandb.ai/alexwortega/trl/runs/cszh2ve8?workspace=user-alexwortega
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.
@AlexWortega in your sft
function I believe you are missing an optimizer.zero_grad()
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Closing for now since there is not a lot of activity right now. Feel free to reopen :)