torchchat
torchchat copied to clipboard
`chat` and `generate` mode 3x+ speed difference
chat
and generate
for the same model should yield the same number of tokens, shouldn't they?
But right now there are more than 3x difference, at least as observed on my M1 Pro
% python3 torchchat.py generate llama2 --prompt "Where in the world is Carmen San Diego?"
Using device=mps
Loading model...
Time to load model: 39.65 seconds
Where in the world is Carmen San Diego?
Carmen San Diego is a fictional character in the Nickelodeon animated television series "CatDog". She is the main antagonist of the show and is a notorious catnapper who steals valuable objects and hides them around the world.
Carmen San Diego's whereabouts are not specified in the show, as she is a cunning and elusive criminal who is always on the move. She has been known to travel to various countries and locations around the world, often using disguises and deception to evade capture.
Some of the locations where Carmen San Diego has been known to operate include:
1. Brazil - Carmen has been seen in Brazil several times, where she has stolen valuable items such as the famous "Golden Idol" and the "Rio Formula".
2. Egypt - Carmen has also been to Egypt, where she stole the "
[Max Sequence Length Reached. Ending Conversation.]
---------------------------------------------------
Time for inference 1: 22.32 sec total, 8.96 tokens/sec
Bandwidth achieved: 120.77 GB/s
*** This first iteration will include cold start effects for dynamic import, hardware caches. ***
========================================
Average tokens/sec: 8.96
Memory used: 0.00 GB
% python3 torchchat.py chat llama2
Using device=mps
Loading model...
Time to load model: 37.65 seconds
Entering Chat Mode. Will continue chatting back and forth with the language model until the models max context length of 8192 tokens is hit or until the user says /bye
Do you want to enter a system prompt? Enter y for yes and anything else for no.
User: Where in the world is Carmen San Diego?
Model: Carmen Sandiego is a fictional character in a series of educational computer games and other media. She is the main antagonist and the player's goal is to track her down and arrest her. The first game in the series, "Where in the World is Carmen Sandiego?", was released in 1985 and since then there have been several sequels and spin-offs.
Carmen Sandiego is a master thief and a member of the criminal organization V.I.L.E. (Villains' International League of Evil). She and her accomplices steal valuable objects and landmarks from around the world, and it's up to the player to use clues and deduction to track them down and foil their plans.
The character of Carmen Sandiego was created by Brøderbund Software and the games were originally designed to teach geography and history. The series has
Time for inference 1: 121.14 sec total, 1.75 tokens/sec
Bandwidth achieved: 23.59 GB/s