Robert Knight comments

Results 727 comments of


                                            Robert Knight

Convert quantized models

Can you provide some details on the models you are testing with (eg. is it a well known public model or based on one?) and the tools/processes you are using...

Convert quantized models

Before doing any work in rten itself, I think the starting point will be to pick a couple of test cases of different kinds of model (simple CNN and a...

Convert quantized models

> Should this be a separate repository or should this be added to the rten repository? I think a separate repository would be easiest. The structure doesn't matter too much,...

Convert quantized models

This is awesome, thank you so much :) - A Whisper demo is super useful to have.

Convert quantized models

I did some benchmarks on my 2020 Intel MacBook Pro (Intel i5-1038NG7 @ 2.00GHz) to get a sense of where performance is at. To summarize for others reading this, the...

Convert quantized models

> btw I have a fp16_quant script, it also reduces weights's sizes. Thanks, I saw that. I notice that the whisper-py demo (using onnxruntime) is much slower with the fp16...

Convert quantized models

> I tried fp16 on new macbook's ARM processor and it also works slower than fp32. This is surprising. I would have expected native fp16 to be faster. I wonder...

I started looking into this again in the last couple of days. There is now an [initial branch](https://github.com/robertknight/rten/tree/quant-mvp) which can convert and run the quantized Whisper models ... but _very...

Convert quantized models

> Cool, thank you. I guess I can close my PR https://github.com/robertknight/rten/pull/314 since it's not needed anymore? Yes, for the moment. > Maybe can I help with something? One thing...

Convert quantized models

> I can try to add an example to the [repository](https://github.com/igor-yusupov/comparisons-rten), similar to whisper example. I think to start with a model from your examples. 👍 The Q-prefixed operators are...