Junru Shao
Junru Shao
Update: I opened up a repo (https://github.com/junrushao/llm-perf-bench) of dockerfiles to help reproduce cuda performance numbers. The takeaway is: MLC LLM is around 30% faster than Exllama. I’m not a docker...
WSL support for Vulkan is not there yet AFAIK, so please use CMD instead if you want to use Vulkan on Windows
> dial tcp: lookup Would you mind checking the network connection?
It’s up! https://twitter.com/bohanhou1998/status/1655772690760994818
`tune_relax` isn't something we are currently using to tune LLMs, because it only supports static shape workloads. Instead we are using a mixed strategy that allows dynamic shape workloads as...
macbooks in recent years all ship with M1/M2 GPUs, which are fairly capable of running LLM workloads
Agreed that "tuning" is a pretty overloaded term - in this particular case, I am referring to "auto-tuning compiler", which is the key to GPU performance. With TVM Unity auto...
Macbooks ship with M1/M2 GPUs are quite capable of running those LLM workloads, but indeed Intel integrated ones with previous macbooks are far behind
I'm a native Chinese speaker, but have to admit that I don't fully understand non-ASCII encoding...@spectrometerHBH is an expert in this! > To avoid garbled text in your CMD command...
RedPajama-3b should work