Olivia Lee

Results 10 issues of Olivia Lee

![1](https://user-images.githubusercontent.com/32992656/98470475-cf785600-2220-11eb-8e9e-6c9d3de29a97.png) this is training loss of the last step

I think some ops should propagate its result rather than the shape of its result in order to let following ops work properly during shape inference, for example, consider the...

bug
shape inference
no-issue-activity

As suggested by the title, this PR attempts to add torch.compile support for mistral, and this is a not-ready-to-merge PR, it tries to replicate what has been done in Llama...

hey, i am using model.save as you mentioned so that i could get .pb file, but it turns out that i only get a file without any suffixes which is...

This PR is working in progress and it tries to add torch compile support for Mixtral, it currently also contains changes from #30642 because there are some common ground shared...

`torch.compile` support for mamba! Closes #31246

run-slow

This pr fixes a scenario where we want to use dynamo trace in training mode, the current attn mask ignore logic creates a problem where data-dependent branch condition `torch.all(attn_mask==1)` will...

run-slow

The parameter cache instance is needed to handle recompilation where we need to make sure the parameters we created in the first run are used, currently the use case does...

As per title, this PR tries a more general approach rather than relying purely on human heuristics, basically it uses the following steps to search a possible parallelization strategy for...

# What does this PR do? - [x] add backend abstraction - [x] refactor the original pipeline flow to accommodate potential needs of different backend - [x] modify API so...