Cody Yu comments

Results 161 comments of


                                            Cody Yu

DeepSpeed Inference support for OPT

I have the same requirement so I wrote a custom policy for OPT and it seems working (i.e., the inference engine was initialized successfully and I could see some `nvcc`...

[DietCode] Local Padding

Also cc @Hzfengsy @vinx13 @spectrometerHBH @masahi

Per offline discussion with @junrushao1994 and @ArmageddonKnight, here is the current action items: 1. The local padding pass will be moved to TIR transformation, meaning that local padding becomes an...

[Feature] Support multiple executables in OPT serving

It's easier to reproduce the first one based on this PR. You could change the function `load_multi_executable_params_dis_array` to make it call `load_params_dis_array` just once and return the same params for...

[Feature] Support multiple executables in OPT serving

Hmm looks like the cache shared by executables wasn't correctly used. I'll try to fix it next week.

[Feature] Support multiple executables in OPT serving

The correctness problem has been resolved. There are two points in OPT model that do not consider prompt>1: 1. The attention bias (Fixed in #608 ). 2. Input position IDs...

[Feature] Support multiple executables in OPT serving

Update: The input_sharding_spec is added. Now we apply input_sharding_spec from executable-1 to all executables. This PR is now based on: #619 #620 I'll rebase this PR after the above PRs...

[Feature] Support multiple executables in OPT serving

This PR is ready for review and merge. Meanwhile, I'm not sure about how should we do with #623

"FileNotFoundError" when use vscode in windows

@masahi it looks like the path issue on Windows?

Remove tophub from being the default configuration when compiling

> I ran with and without tophub on a selection of models in `tvm.relay.testing`: Thanks for the experiments. What's your target device?