torchgfn
torchgfn copied to clipboard
Guidelines on finetuning LLMs as policy models
Hello,
Thank you for your effort on releasing such great implementation of GFN! I am working on the using GFN to finetune an LLM to be a policy model (which I believe would be a popular use case) and would like to ask for some suggestions.
The main problem in this scenario is to efficiently sample from language models in a parallel way, which need to store the key-value cache of Transformer to avoid re-computation. Do you have any suggestions on how to implement the State class so that we can store the partial token sequence and key-value cache at the same time?
Hello,
Thanks for raising the issue. This is an important question we're trying to address these days: how to allow for more flexible state spaces, including graphs for instance.
As of now, states need to be represented as tensors, so the natural way would be to consider long tensors that contain all information you need to transition from a state to another. In this case, maybe you can use some dimensions of the state to store the key-value cache, and some dimensions to store the decoded token indices