Simon Mo

Results 313 comments of Simon Mo

This method is interesting and I believe pretty effective overall (also see https://research.character.ai/optimizing-inference/) However, it seems like currently it require model trained for this techniques. We would love to see...

You are correct. The change will be small because just need to enable cross layer sharing, and prevent writing to the cache in later layers. However, I would like to...

now we added supported for llava, this is welcomed!

Unfortunately we don't have a way to gracefully fallback due to these can only be detected per request. There is no engine parameter like `--enable-per-request-logits-processors`. Definitely lesson learned. Sorry about...

Good feature request! I would imagine draft model to be especially useful to be compiled.

Hi @mgoin @tlrmchlsmth, what are the remaining blockers for this PR? (other than #14306)?

Would be great to get this in quickly by tomorrow, so we can make it part of v0.8.0 release

I'm in favor of all these! Please also make sure it is well documented.

from rob > openai tool use --> looks like a numerics issue. We are comparing against a golden string I will post a PR in the AM to update the...

can this just be done/generated by torch compile?