Andy Salerno

Results 13 comments of Andy Salerno

Not lying, I have been checking in on this PR periodically for a week now to see if it has merged :D Looking forward to trying 2bit Mixtral on my...

Is it really that beneficial to use the *exact* context used in the training prompt? Asking out of genuine curiosity. In my anecdotal experience, I started out by always trying...

I've also been bitten by this. I'm not an expert, but would it make sense to implement "token healing", as described by the Guidance project here: https://github.com/guidance-ai/guidance/blob/main/notebooks/token_healing.ipynb I have a...

quick note, I just realized in the recording it is called "streaming_api," but before making the final PR I renamed it to "api_streaming" so it shows up next to "api"...

I'll be taking a look at cleaning this up tomorrow if I have time- updating the parameters, and combining with the non-streaming API as suggested.

I made some progress today, which you can see in the latest commits. I see there are conflicts - later this evening I'll try to resolve. But, there's still one...

(somehow the PR got closed while I was fixing merge conflicts, so I reopened) Ok, I think I've solved the problem with streaming and cloudflared. The short version is, when...

> The public API url seems to have a small bug where it's only generated for the streaming server. Just pushed two commit fixing the above. The line "Starting API...

+1, I would love this as well!

@jon-tow on this topic, do you expect these models to quantize well down to 4bits (or lower) via GPTQ and/or other quantizing strategies? I don't see why not, since GPTQ...