setzer22 comments

Results 131 comments of


setzer22

Hi, is there any plans for word embeddings?

Hi @hlhr202! :wave: Thanks for bringing this to our attention. The code here doesn't look hard at all to port! We will add it to the repo since it makes...

Hi, is there any plans for word embeddings?

Please check out #72. I implemented some code to extract embeddings, but we still need to validate if the results are correct, and how to best expose this to our...

Add a hotkey to stop --repl generation

This sounds like a good idea. But using signal interceptors like Ctrl-C or Ctrl-D and changing their semantics feels like a bit of a hack. I wonder if we could...

Embedding extraction

I already addressed the review feedback and removed the ad-hoc test code. So I take it a good plan now would be to merge this as-is and have embedding extraction...

Embedding extraction

> Is it a lot of data? It is quite a lot of data for comfortably printing to stdout. It's 4096 floats per token. Not that it wouldn't work, but...

Embedding extraction

I'm open to adding a way for the CLI to output embeddings if people find this is an interesting use case. The main blocker here is that the use case...

Copy v_transposed like llama.cpp

I'm a bit confused about this change. Does it increase quality? Because from what you're reporting, it seems to increase memory use *and* increase inference time. Probably needs some more...

Copy v_transposed like llama.cpp

> As far as I could tell, it was deterministic with a seed specified from the testing I did Yup, we're just being a bit careful with promising determinism overall,...

Copy v_transposed like llama.cpp

I've been doing some tests, but it's hard to measure if inference speed has gotten slower with the code change because different prompts can make inference speed vary by up...

Copy v_transposed like llama.cpp

I couldn't notice any performance differences in my tests either, so I'd say we can merge as is. No need to put it behind a flag. > I guess it's...