Julia Longtin
Julia Longtin
> > goes from a token every 0.18 seconds on mistral 7B instruct to a token every 0.82 seconds. > > Are you showing a performance regression? Or are the...
> > > goes from a token every 0.18 seconds on mistral 7B instruct to a token every 0.82 seconds. > > > > > > Are you showing a...
> > > > goes from a token every 0.18 seconds on mistral 7B instruct to a token every 0.82 seconds. > > > > > > > > >...
> > goes from 0.18 tokens per second on mistral 7B instruct (Q5K) to 0.82 tokens per second. > > How many threads is that with? Since Xeon Phi has...
now runs at 1.2 tokens per second.