Tianqi Chen

Results 637 comments of Tianqi Chen

this is now supported through full OAI API

Thanks for asking, we have a standalone inference engine that contains a mixed wasm and javascript

each wasm would account for one model type(aka all llama2 variants that fits within certain ctx length and vocab limt). We are working on revamping a framework that also makes...

the model will be cached in the browser cache. It runs on the frontend fully without backend support

this is now fixed and all convo templates are standardized

thanks @beaufortfrancois for bringing it up! Unfortunately this is a limit that we would need help from Chrome side to lift. This is mainly because the model itself do require...

Thanks for the note! let us look a bit into it and see if it is possible to get a variant of small model that fit into this limit. In...

that model depends on shader-f16 feature, which only existed in chrom canary (and not chrome stable AFAIK), so not sure if it works on android. If it is possible to...

It is possible that the model crashes in Llama, we can hit VRAM limit in Llama2 models(that goes beyond 4GB) when using 4 bit quantization, and in iOS we had...

Glad that 3B model works, this is the first running example of webgpu native LLM on mobile phone AFAIK, thank you @beaufortfrancois for pushing this. Love to share this with...