turboderp comments

Results 179 comments of


                                            turboderp

will it work with Nvidia P40 24GB on Linux?

>Maybe it was a silly try, but self.weight = tensors[key].half() did not work. That would turn the q4 weights into half types without converting them first. So that definitely wouldn't...

will it work with Nvidia P40 24GB on Linux?

That's a new one. An internal error in SentencePiece would suggest either you've got a corrupted tokenizer.model or the wrong version of SentencePiece installed perhaps? I'm using 0.1.97, if that...

will it work with Nvidia P40 24GB on Linux?

I can't think of anything else at the moment, really. That, or try a different model, or try downloading the tokenizer.model file again.

will it work with Nvidia P40 24GB on Linux?

>I'm unclear of how both CPU and GPU could be saturated at the same time. PyTorch waits in a busy loop whenever it synchronizes a CUDA stream, as far as...

will it work with Nvidia P40 24GB on Linux?

Having read up on it a bit, good performance on P40 might be a ways off, unfortunately. Apparently its FP16 performance is 1/64 of its FP32 performance. I guess it's...

will it work with Nvidia P40 24GB on Linux?

Yep, it converts everything to FP32 on the fly. It's hard to get to 160 tokens/second that way, and hard to run a 30B model at full context length when...

will it work with Nvidia P40 24GB on Linux?

I think I'd need to know for sure exactly when half2 support is provided by CUDA and when it isn't. Cause there's still a half2 path that needs to compile,...

will it work with Nvidia P40 24GB on Linux?

The FP16 problem remains, but INT8 would present problems of its own. It's an integer type, after all, not a drop-in replacement for floats.

Feature Request: length_penalty support

Could you elaborate? There are various more-or-less hacky ways to force shorter or longer replies from a language model, but no standard way of doing it. Is there a particular...

InternLM2 math model breaks with exllamav2_HF loader (works with non-HF)

Here are the last pieces in the SentencePiece model: ``` 92530 [UNUSED_TOKEN_133] 92531 [UNUSED_TOKEN_134] 92532 [UNUSED_TOKEN_135] 92533 [UNUSED_TOKEN_136] 92534 [UNUSED_TOKEN_137] 92535 [UNUSED_TOKEN_138] 92536 [UNUSED_TOKEN_139] 92537 [UNUSED_TOKEN_140] 92538 [UNUSED_TOKEN_141] 92539 [UNUSED_TOKEN_142]...