nonnull

Results 20 comments of nonnull

A lofty goal! Be aware that under the hood llama (and indeed most ANNs) use floating-point, and floating-point determinism is a rabbit hole with no bottom. Some particular issues that...

You seem to be thinking that a transformer is a function `F(tokens[0..=n-1]) -> probs[n]`. It isn't. It's a function `F(tokens[0..=n-1], probs[0..=n-1]) -> probs[n]`: You need the output probabilities of the...

I've been hitting this with the 65B model in oneshot mode with a 2048-token context. So it's not just interactive sessions that are affected. This is fairly easily reproducible with...

> Increasing batch size also makes llama.cpp run out of memory, so any solution that only considers the context size and not the batch size is likely wrong. Ah. That...

I haven't looked; how does LLaMA handle prompts that are smaller than context size? E.g. a 1024-token prompt with 2048-token context size. Does it just truncate to an effective context...

I was debugging an issue with the same symptoms today. As it turns out, the problem was that I was returning a Gradio component itself from an event listener instead...

> Reading the comments above - yeah, if we can efficiently implement a lookup table `int8/int4->float16` using AVX/NEON, then it might be really worth trying the non-uniform approach. For the...

Basic idea for a int4->f16 lookup table on AVX2: Low nibbles: 1. Mask out high nibbles of each input byte, because `VPSHUFB` uses the high bit to clear bytes to...

> > The bins are now much more evenly utilized. > > I am wondering how we could update the algorithm so also the first bin is utilized, currently it...

> Perhaps Posit arithmetic could be valuable? http://www.johngustafson.net/pdfs/BeatingFloatingPoint.pdf Posits offer more dynamic range, at the expense of less accuracy for large numbers. If the largest weights matter the most, and...