Arash Bakhtiari
Arash Bakhtiari
to create a LCE Lite package for iOS, first we need to create a LCE C API library similar to [TF Lite C API](https://github.com/tensorflow/tensorflow/blob/1210b521aa2226b01ee1bd9528a8f247b7283efb/tensorflow/lite/c/BUILD#L55). Then the LCE C API will...
this boils down to implementing a fast binary matrix-vector multiplication
@yingapple can you please confirm this is only related to OPT model or you have observed similar issue with other models as well? Could you also please provide us with...
@s-jse thanks for reporting this issue! Currently The DeepSpeed-FastGen [fused bias and activation kernel](https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/inference/v2/kernels/core_ops/bias_activations/bias_activation.py) demands the number of channels be divisible by 8 as it takes advantage of vectorized instructions...