Benjamin Maxwell
Benjamin Maxwell
Update: 1. The bad hosting of reads/writes is real, so we may want to consider disabling (for the CPU backend?) this or at least having it off by default. However,...
So before this change we'd get: %15 = scf.for %arg4 = %c0 to %c352 step %c1 iter_args(%arg5 = %14) -> (tensor
Had a quick look a 2.: The `insert_slice(transfer_write)` does not apply because the transfer_write is masked. So just looking at those two ops it may not be a legal replacement....
Btw, I forgot to mention but when I took a look at the folds I spotted at least one upstream bug, which I reported here: [llvm/llvm-project#101708](https://github.com/llvm/llvm-project/issues/101708)
@hanhanW Are there any aarch64 IREE benchmarks now? ([benchmarks:android-cpu](https://github.com/iree-org/iree/labels/benchmarks%3Aandroid-cpu) seems to no longer function)
The context for this change is I discovered locally that if the tile size of `8, 16, 1` actually gets used (and not resized), the backend ends up running out...
Closing this as we have an alternate solution that does not have a negative performance impact :slightly_smiling_face:
cc @joker-eph
Not got a fix, but the issue is in `Painter::for_each_line_segment_on_cubic_bezier_curve`. It's asked for the line segments of this curve, `c0 [30303.033,-23565770000], c1 [37037.04,-23565756000], point [28282.83,-23565780000]` (and appends them all to...
These really large values do not play nice with the path splitting error computation, the distances between `floats` is _really_ large here (2048, I think), which just means things go...