Bunny icon indicating copy to clipboard operation
Bunny copied to clipboard

Dear Developers: we ask some base question!

Open QiaoTuCodes opened this issue 11 months ago • 4 comments

Dear Developers:

Thank you to the BAAI team for open-sourcing the Bunny model. I've been actively exploring it these past few days. I have a few doubts regarding the deployment of the model, and I hope to get answers from the BAAI official technical team. Nevertheless, I am extremely grateful! The first question is: I want to know the GPU running conditions required for several versions of the model. For example, the Bunny-v1_0-3B full parameter version and the bunny-phi-2-siglip-lora version. so can you provide a list for comparison and clarification? What are the officially recommended GPU models and VRAM sizes?The second question is: Can this model integrate the controller, Web-UI server, and Model Worker directly into one bash command ? Currently, it seems that three separate bash commands need to be executed to start the controller, WebUI, and model inference. This seems to be considered for "microservices architecture" or "distributed system architecture". Is my understanding correct?If we deploy using Docker containers and use Kubernetes as the container visual management framework, can an official post be provided to explain in more detail the standard deployment process?

                                                                    by Isaac Wei Ran                                                                                                                  
                                                                    Guangzhou, China, 7th March 2024

QiaoTuCodes avatar Mar 06 '24 20:03 QiaoTuCodes

Thank you for your interest in our work! For the first question, we have successfully deployed it on A100, A5000 and V100, but we haven't tried it on other types of GPUs yet. Sorry we didn't make it clear, in fact, Bunny-v1_0-3B and the bunny-phi-2-siglip-lora denote one version, but bunny-phi-2-siglip-lora is a separate lora weight, and Bunny-v1_0-3B is a combination of siglip, phi-2 and lora for merging. For the second one, considering the demand, we will soon provide a python file that can be used to deploy it directly. We will also write a blog about deployment for your convenience.

LAW1223 avatar Mar 12 '24 01:03 LAW1223

How long did you train your models on A100?

Becomebright avatar Mar 12 '24 07:03 Becomebright

For Bunny-v1.0-3B. It takes about 13 and 12 hours for pretraining and fine-tuning, separately.

LAW1223 avatar Mar 12 '24 07:03 LAW1223

Is it possible to add more multimodal data, not just text and images, but also some intermediate states of processes (that cannot be described with language or images)?

hxypqr avatar Apr 27 '24 09:04 hxypqr

@QiaoTuCodes For the second question Can this model integrate the controller, Web-UI server, and Model Worker directly into one bash command? You may refer to the HuggingFace Space.

Isaachhh avatar Jul 07 '24 10:07 Isaachhh

Is it possible to add more multimodal data, not just text and images, but also some intermediate states of processes (that cannot be described with language or images)?

@hxypqr I think it is possible.

We use a vision tower to encode the images and then map the vision embeddings into LLM embebedding space by an MLP. So, you can import another kind of data with related encoder and projector.

Isaachhh avatar Jul 07 '24 10:07 Isaachhh

Close the issue for now if there's no further discussions. Feel free to reopen it if there's any other questions.

Isaachhh avatar Jul 23 '24 01:07 Isaachhh