Olivia Lee
Olivia Lee
if i am right, is there a way to define behavior of certain op during shape inference
thanks for the response, I think every op which may take a shape as input and the output shape is related with the shape input needs such consideration, and dynamic...
a shape tracer may be needed where both the dynamic symbolic shape and the shape of shape tensor are propagated, dynamic symbolic shape is propagated and transformed in the exact...
> BTW mistral will nee a `SlidingWindowCache` based on the implementation of `RecurrentGemma`! from my understanding, SlidingWindowCache is for memory efficiency, actually I wonder how SlidingWindowCache would address the issue...
Hi Aurthur, I have add support for Sliding Window Cache, and please take a look at its implementation and also the _update_causal_mask implementation, I have add my thoughts as comments
> Good work > > * for all the copied from that were removed, we need to use on of the model as the new base (mixtral for example) >...
> @ArthurZucker Don't forget our `run-slow` feature 🙏 > > @zhenglongjiepheonix Could you push an empty commit with message `[run-slow] mistral`? Thank you 🤗 The slow CI is failing on...
Currently there are some issues related with Mistral tests, since my dev is based on A100, I run these tests on colab T4 using the current main branch, @ArthurZucker @ydshieh...
> If except the 4 mentioned failing tests, all other are passing with this PR + `test_compile_static_cache` is passing on a A10 with torch 2.3, it's OK from my side...
> > > If except the 4 mentioned failing tests, all other are passing with this PR + `test_compile_static_cache` is passing on a A10 with torch 2.3, it's OK from...