trl icon indicating copy to clipboard operation
trl copied to clipboard

add retrival reward idea proof of concept

Open AlexWortega opened this issue 2 years ago • 3 comments

Here is a working proof of concept of my idea about retrival reward, its using sentence transformer to compare generated answer and ground true answer. WB metrics https://wandb.ai/alexwortega/trl/runs/cszh2ve8?workspace=user-alexwortega

AlexWortega avatar Feb 15 '23 14:02 AlexWortega

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@AlexWortega in your sft function I believe you are missing an optimizer.zero_grad()

kashif avatar Feb 16 '23 13:02 kashif

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

github-actions[bot] avatar Jun 20 '23 15:06 github-actions[bot]

Closing for now since there is not a lot of activity right now. Feel free to reopen :)

lvwerra avatar Jun 23 '23 13:06 lvwerra