Benoit Jacob

Results 95 comments of Benoit Jacob

Wow, the "improved latencies" report here show ~ 3x improvements on ViT, exactly the motivation for this in #15399! @mariecwhite The size and dispatch regressions on the same model are...

Approved - note, I think it's OK to skip expanding the e2e test as the e2e logic is essentially shared among the two cases. However, if you already have the...

@benvanik, The first report about this line, https://github.com/openxla/iree/blob/76cbaaca2cf44ac0e4f58cbce4134c253e38f758/runtime/src/iree/io/memory_stream.c#L267 And the code path for loading an unaligned uint16 goes here: https://github.com/openxla/iree/blob/76cbaaca2cf44ac0e4f58cbce4134c253e38f758/runtime/src/iree/base/alignment.h#L364-L366 It is always undefined behavior to have a C or...

The gguf_parser.c one is less obvious for me to read. Hopefully @benvanik can pick up from here.

yes, typical. this undefined behavior could translate into anything in theory, but when the target load/store instructions don't have any alignment requirement, that would "just work". When the target load/store...

That's exciting! i'd love to chat over video and discuss the diff you have, and generally better understand what you've been doing as it sounds like you were way ahead...

Interesting. This is a 5-thread benchmark across 2 tiers of cores (taskset 1f0) How does it look on 1-thread on a taskset that selects 1 tier of cores (eg taskset...

> If looking for a 2x, ignore the inefficient 4% thing for now? Yes. Since we are looking for a 2x and both the XNNPACK and the non-XNNPACK profile show...