Cody Yu

Results 161 comments of Cody Yu

I have the same requirement so I wrote a custom policy for OPT and it seems working (i.e., the inference engine was initialized successfully and I could see some `nvcc`...

Also cc @Hzfengsy @vinx13 @spectrometerHBH @masahi

Per offline discussion with @junrushao1994 and @ArmageddonKnight, here is the current action items: 1. The local padding pass will be moved to TIR transformation, meaning that local padding becomes an...

It's easier to reproduce the first one based on this PR. You could change the function `load_multi_executable_params_dis_array` to make it call `load_params_dis_array` just once and return the same params for...

Hmm looks like the cache shared by executables wasn't correctly used. I'll try to fix it next week.

The correctness problem has been resolved. There are two points in OPT model that do not consider prompt>1: 1. The attention bias (Fixed in #608 ). 2. Input position IDs...

Update: The input_sharding_spec is added. Now we apply input_sharding_spec from executable-1 to all executables. This PR is now based on: #619 #620 I'll rebase this PR after the above PRs...

This PR is ready for review and merge. Meanwhile, I'm not sure about how should we do with #623

@masahi it looks like the path issue on Windows?

> I ran with and without tophub on a selection of models in `tvm.relay.testing`: Thanks for the experiments. What's your target device?