Douglas Hanley comments

Results 45 comments of


                                            Douglas Hanley

type conversion before tl.dot fails compilation

I'm seeing similar issues here with `uint8` → `float16` (or `float32`). Using nightly with an A6000. The application is quantized matrix multiplication. I've found that basically only a block size...

(llama-cpp-python v0.2.57) RuntimeError: Failed to get embeddings from sequence pooling type is not set

Yeah, right now we don't support getting token level embeddings. So generative models like llama-2 that lack pooling layers won't work. Are you looking for token level embeddings or sequence...

(llama-cpp-python v0.2.57) RuntimeError: Failed to get embeddings from sequence pooling type is not set

@r3v1 Is it still raising an error, or is it just that it's returning token level embeddings as a list of lists? Generative models like these don't do pooling intrinsically...

(llama-cpp-python v0.2.57) RuntimeError: Failed to get embeddings from sequence pooling type is not set

Yeah, the langchain interop code is unforunately broken right now for getting embeddings from generative models. For it to work in this case, we'd need to implement manual pooling somewhere....

WIP: Parallel generation implemenation

Thanks for the comments @abetlen! Yeah, so I think that this is basically a superset of (1) right now. If you call `create_completion_parallel(n*[prompt])` you'll get back `n` independent responses for...