refiners
refiners copied to clipboard
[Ready For Review] Implement Fuyu
This is an early draft PR to indicate that i'm working on the fuyu 8b implementation. A lot of work still needs to be done, including
- [x] Finalization of the Fuyu architecture
- [x] Adaptation of the weights + validate the architecture
- [x] Change Tokenizer
- [x] Inference function
- [x] Unit Testing
I will update this PR and ping you as soon as more significant progress has been made or if I encounter any blockers that require discussion. Thank you !
This bounty is stale because it has been opened for 7 days with no activity.
Hi Just to give a quick update, weights now adapt nicely and the model works as intended. It took me a while to debug things due to my low amount of computational power but everything should be quicker now ! Next steps are cleaning and documenting the code, writing a proper inference function and finally some unit testing as written in the original message Thanks !
This bounty is stale because it has been opened for 7 days with no activity.
This bounty was closed because it has been inactive for 7 days since being marked as stale.
Hi, this PR is finally ready for review ! A few precisions about the implementation :
-
The Hugging Face model doesn't use flash attention in the persimmon/fuyu model. Thus to have exactly the same logits the is_optimized argument in our model needs to be set to False. Note that the model produces consistent answers regardless of the attention optimization state.
-
The output logits match those of the Hugging Face model when both are set to float32. Although the test script fails with float16, the final answers remain consistent across both data types for any given (image, prompt).
-
Finally Hugging Face implements caching of the key and value states in each attention layers by default during generation for speeding up decoding. I didn't implement this functionnality. This could lead to minor variations in the final answer after several iterations. With use_cache set to False the HuggingFace model output the same exact final answer as this implementation. It isn't mentionned in the Adept.ai blog post, however flash attention is.
To use the model :
import requests
from PIL import Image
from refiners.fluxion.utils import load_from_safetensors
from refiners.foundationals.fuyu.fuyu import create_fuyu, Fuyu8b
config = Fuyu8b()
network = create_fuyu(config)
tensors = load_from_safetensors("/path/to/fuyu.safetensors")
network.load_state_dict(tensors)
url = "https://huggingface.co/adept/fuyu-8b/resolve/main/bus.png"
image = Image.open(requests.get(url, stream=True).raw)
prompt = "Generate a coco-style caption.\\n"
answer = network.generate([image], [prompt], max_len_generation=100)
I'm open to any feedback especially on the caching functionality and an eventual necessity to implement it.
Thank you !
Hi, Just realized that I didn't pass the CI/CD because of Pyright, I'm aware of the issue and will correct every Pyright errors this evening. Apologies for the setback
Edit : Done
Thank you for the reviews ! Concerning the test script, i know generate the references in a conftest.py before the different test_ functions. I hope this solution will be satisfying :)
This bounty is stale because it has been opened for 7 days with no activity.
This bounty is stale because it has been opened for 7 days with no activity.
This bounty was closed because it has been inactive for 7 days since being marked as stale.
Hi @LouisRouss, sorry for the delay! We're very busy with other projects at the moment, I will keep the PR open
This bounty is stale because it has been opened for 7 days with no activity.
This bounty was closed because it has been inactive for 7 days since being marked as stale.