Ben Vanik comments

Results 416 comments of


                                            Ben Vanik

HIP runtime memory issue for Llama 3.1 70B F16.

132GB of allocated device memory is a lot - just because you have that much physical memory does not mean that all of it can be allocated. We never even...

HIP runtime memory issue for Llama 3.1 70B F16.

The problem when running up against physical memory limits is that it's not something you can reason about as a sum: you can almost never use all of the physical...

HIP runtime memory issue for Llama 3.1 70B F16.

heh, yeah, that'll be a problem :P I'm going to bet that it's some hoisted initializers that are transposing every single parameter or something ridiculous (260=2*130, probably two copies of...

HIP runtime memory issue for Llama 3.1 70B F16.

yeah, we suballocate, produce a max value, and then allocate that - if you --mlir-print-ir-before=iree-stream-schedule-allocation / --mlir-print-ir-after=iree-stream-schedule-allocation it'll make it easier to see what's mapping to what

HIP runtime memory issue for Llama 3.1 70B F16.

Nice, you've found it - that's what I suspected. As you note when models get this big (though I'd argue for anything deployed of any size) we need to be...

HIP runtime memory issue for Llama 3.1 70B F16.

That's great news :) Thinking for when cases worse than this arise something that we should do is have some analysis that forces stream partitioning to min-peak-memory when execution is...

[ROCM] OPT Pass plugin test refactoring and organization

(closing as stale)

[compiler] Removing copies of duplicate globals

isStructurallyEquivalentTo with the cache is what you'll want to use. Currently it has the #3996 TODO about symbols that would make it not work for this, but that could be...

[compiler] Removing copies of duplicate globals

Responding to your question then will take a look at the new code! > From what we want to do in this pass, it seems the best place to run...

Warn when `--iree-llvmcpu-target-cpu` defaults to "generic".

generic is useful for people who want to run something on a machine that is not their own - that's why it's the default for things like clang/gcc - host...