Nick Smith
Nick Smith
Awesome, thank you for the quick response! This is a good insight, we're going to investigate what it takes to invoke the GSPMD -> Shardy conversion pass.
Yes we are using that flag: ```python os.environ["PJRT_DEVICE"] = "TT" os.environ["XLA_STABLEHLO_COMPILE"] = "1" ``` I think we also tried, without luck: ```python config.update("jax_use_shardy_partitioner", True) ```
@cmaryanTT yes we're aligned. The gotcha we ran into is that the LLK is implemented as `(input * scale_recip) + zero_point` instead of `(input / scale) + zero_point` (since SFPU...
@wenbinlyuTT, buda requant recalculates the scale on the fly. I think we're in the same boat as quant, if the scale factors are scalars we can do this computation on...
@wenbinlyuTT, it seems like [Buda cheated](https://github.com/tenstorrent/tt-buda/blob/1e949f50075591b7c66e565951bf68c4ff1b0a69/pybuda/pybuda/op/eval/pybuda/quantize.py#L103) (only supports input zero point of 0 :) which is honestly a viable approach for now. But just thinking out loud, it seems possible?...
Just to sanity check ONNX definition: *Also wanted to be sure my terms above don't move past any non-linearities like saturate* ### [Quant](https://github.com/onnx/onnx/blob/main/docs/Operators.md#QuantizeLinear) ``` y = saturate((x / y_scale) +...
@dbaileychess can I get a review of this?
> I am concerned this would be a breaking change -- if there is other code that depends on the lower case output to json, this would cause issues. Similarly,...
> Sadly this is a bit of a breaking change, since the dialect of "FlatBuffers JSON" has been using these C++ based `inf` / `nan` values since forever. > >...