Junru Shao

Results 179 comments of Junru Shao

Update: I opened up a repo (https://github.com/junrushao/llm-perf-bench) of dockerfiles to help reproduce cuda performance numbers. The takeaway is: MLC LLM is around 30% faster than Exllama. I’m not a docker...

WSL support for Vulkan is not there yet AFAIK, so please use CMD instead if you want to use Vulkan on Windows

> dial tcp: lookup Would you mind checking the network connection?

It’s up! https://twitter.com/bohanhou1998/status/1655772690760994818

`tune_relax` isn't something we are currently using to tune LLMs, because it only supports static shape workloads. Instead we are using a mixed strategy that allows dynamic shape workloads as...

macbooks in recent years all ship with M1/M2 GPUs, which are fairly capable of running LLM workloads

Agreed that "tuning" is a pretty overloaded term - in this particular case, I am referring to "auto-tuning compiler", which is the key to GPU performance. With TVM Unity auto...

Macbooks ship with M1/M2 GPUs are quite capable of running those LLM workloads, but indeed Intel integrated ones with previous macbooks are far behind

I'm a native Chinese speaker, but have to admit that I don't fully understand non-ASCII encoding...@spectrometerHBH is an expert in this! > To avoid garbled text in your CMD command...

RedPajama-3b should work