torchchat
torchchat copied to clipboard
enable llava on torchchat
This PR enable llava1.5 on torchchat, which is the first multi-modality model on torchchat.
How to play?
You can use --prompt
as the flag for text input, and --image-prompt
as image input.
e.g.
(torchchat) [ ~/torchchat (9e4350d7b)]$ python torchchat.py generate llava-1.5 --prompt "A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. USER: <image> What are the things I should be cautious about when I visit here? ASSISTANT:" --image-prompt ../view.jpg
Using device=cuda NVIDIA PG509-210
Loading model...
Time to load model: 5.16 seconds
-----------------------------------------------------------
Image prompts ['../view.jpg']
A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. USER: <image> What are the things I should be cautious about when I visit here? ASSISTANT: When visiting this vibrant and fascinating place, one should be cautious about the potential for the area to be crowded or filled with tourists. This might lead to overcrowding and a loss of personal space. Additionally, the vibrant design and colorful patterns might be visually stimulating, so it's essential to be cautious while taking photographs, ensuring not to bump into other people or unintentionally obstruct their views. It's also important to be aware of your surroundings and belongings, as the brightness of the colors and intricate patterns can make them harder to spot. Lastly, it's always a good idea to respect the cultural and artistic value of such places, by refraining from touching or interacting with the artwork without permission or being mindful of your surroundings.2024-09-23:20:24:10,645 INFO [generate.py:1031]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Generated 185 tokens
Time for inference 1: 11.4913 sec total
Time to first token: 0.6280 sec with parallel prefill.
Total throughput: 16.1861 tokens/sec, 0.0618 s/token
First token throughput: 1.5923 tokens/sec, 0.6280 s/token
Next token throughput: 17.0298 tokens/sec, 0.0587 s/token
2024-09-23:20:24:10,645 INFO [generate.py:1042]
Bandwidth achieved: 228.66 GB/s
2024-09-23:20:24:10,645 INFO [generate.py:1046] *** This first iteration will include cold start effects for dynamic import, hardware caches. ***
========================================
It can also handle input without image input:
(torchchat) [ ~/torchchat (9e4350d7b)]$ python torchchat.py generate llava-1.5 --prompt "A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. USER: What are the things I should be cautious about when I visit Canada? ASSISTANT:"
Using device=cuda NVIDIA PG509-210
Loading model...
Time to load model: 5.50 seconds
-----------------------------------------------------------
A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. USER: What are the things I should be cautious about when I visit Canada? ASSISTANT: There are several things you should be cautious about when visiting Canada, including:
1. Health and safety: Canada has a generally safe environment, but as with any country, you should be mindful of your surroundings and take precautions to stay safe. This includes being cautious of pickpockets in crowded areas, watching out for traffic when crossing the street, and avoiding potential hazards in public spaces.
2. Weather: Canada has a varied climate, with different regions experiencing different weather conditions. Be prepared for unexpected changes in weather and dress appropriately for the climate you will be visiting.
3. Customs regulations: When bringing items into Canada, you must declare any goods that are subject to customs duty or tax. There are also restrictions on bringing certain items into the country, such as food and plants.
4. Language: While English and French are the official languages of Canada, not all2024-09-23:20:25:49,785 INFO [generate.py:1031]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Generated 199 tokens
Time for inference 1: 13.8391 sec total
Time to first token: 0.5637 sec with parallel prefill.
Total throughput: 14.4518 tokens/sec, 0.0692 s/token
First token throughput: 1.7741 tokens/sec, 0.5637 s/token
Next token throughput: 14.9901 tokens/sec, 0.0667 s/token
2024-09-23:20:25:49,786 INFO [generate.py:1042]
Bandwidth achieved: 204.16 GB/s
2024-09-23:20:25:49,786 INFO [generate.py:1046] *** This first iteration will include cold start effects for dynamic import, hardware caches. ***
========================================