Olivia Lee

Results 17 comments of Olivia Lee

if i am right, is there a way to define behavior of certain op during shape inference

thanks for the response, I think every op which may take a shape as input and the output shape is related with the shape input needs such consideration, and dynamic...

a shape tracer may be needed where both the dynamic symbolic shape and the shape of shape tensor are propagated, dynamic symbolic shape is propagated and transformed in the exact...

> BTW mistral will nee a `SlidingWindowCache` based on the implementation of `RecurrentGemma`! from my understanding, SlidingWindowCache is for memory efficiency, actually I wonder how SlidingWindowCache would address the issue...

Hi Aurthur, I have add support for Sliding Window Cache, and please take a look at its implementation and also the _update_causal_mask implementation, I have add my thoughts as comments

> Good work > > * for all the copied from that were removed, we need to use on of the model as the new base (mixtral for example) >...

> @ArthurZucker Don't forget our `run-slow` feature 🙏 > > @zhenglongjiepheonix Could you push an empty commit with message `[run-slow] mistral`? Thank you 🤗 The slow CI is failing on...

Currently there are some issues related with Mistral tests, since my dev is based on A100, I run these tests on colab T4 using the current main branch, @ArthurZucker @ydshieh...

> If except the 4 mentioned failing tests, all other are passing with this PR + `test_compile_static_cache` is passing on a A10 with torch 2.3, it's OK from my side...

> > > If except the 4 mentioned failing tests, all other are passing with this PR + `test_compile_static_cache` is passing on a A10 with torch 2.3, it's OK from...