turboderp comments

Results 180 comments of


                                            turboderp

Interesting method to extend a model's max context length.

Well, like I said, you're not compressing the _content_ of the context. It's not like it has a fuzzier recollection of tokens when their positional embeddings are closer together. It's...

Tesla P40 only using 70W underload

>There is a flag for gptq/torch called use_cuda_fp16 = False that gives a massive speed boost -- is it possible to do something similar in exllama? Well, it would give...

Interesting method to extend a model's max context length.

I finished some more thorough tests now, and it's actually kind of promising. Perhaps @kaiokendev would be interested as well: ![superhot_test](https://github.com/turboderp/exllama/assets/11859846/0b08e754-0f01-4a33-85f8-876c16bee68a) This is running a perplexity test on a number...

Tesla P40 only using 70W underload

I'm curious how you're configuring the model in this case? If you're running with `max_seq_len = 8192` in all cases, then the model is correctly allocating the full cache in...

Tesla P40 only using 70W underload

Yep. It's necessary to avoid memory fragmentation, but it also makes more sense to me to allocate up front what you can predict you're eventually going to need anyway. But...

Tesla P40 only using 70W underload

Nah, it's fine. It explains it well enough, since it looks like they're a little behind with their packaging of ExLlama as a library.

Interesting method to extend a model's max context length.

@alkeryn I think it's a little premature to start demanding that the model understand multiple scales, before there's anything to suggest it needs more than one scale. @kaiokendev I noticed...

Interesting method to extend a model's max context length.

@QM60 I'm not really having trouble running 8k contexts for 13B. But for 33B, yes, it's going to be trickier. I do have a second 24 GB GPU, luckily. So...

Interesting method to extend a model's max context length.

@Jeduh You're still teaching the model two different behaviors that have to coexist. Much harder than just modifying one existing behavior. And you need some kind of rationale anyway. What...

ImportError: DLL load failed while importing exllama_ext: 找不到指定的模块。

Any clue which DLL it's failing to find?