Jeremy Kun
Jeremy Kun
I was able to get a local cibuildwheel working with https://github.com/google/heir/pull/2406
Good first step: force a layout conversion of the operands so that the slots are aligned per ciphertext, and so that linalg.reduce (for arith.addi/f) can be implemented just by summing...
I think this aligns with the idea in https://github.com/google/heir/issues/2254#issuecomment-3452260868, though if the data tensor is 3-dimensional this will not suffice, even with a row-major layout (this is part of why...
We have this as pass-postprocessing at the moment. Should we do more here?
Looks like we'll need to keep it for a while longer, most of our tests that do naive loops that came from HECO don't work when the HECO passes are...
we could potentially migrate all those tests to use linalg ops, and then ensure the linalg ops have kernels, but then the frontend tests that involve naive loops would also...
That is a good point, I hadn't considered something like encoding i64 in i16 as part of this, but it would be required to enable the full unpacking/decoding in the...
Experimenting with the OpenFHE benchmark suite (outside of HEIR) I can get similar timing numbers with ``` cmake -DCMAKE_BUILD_TYPE=Release -DWITH_NTL=ON -DWITH_TCM=OFF -DMATHBACKEND=6 -DWITH_NATIVEOPT=ON -DNATIVE_SIZE=64 -DBUILD_BENCHMARKS=ON -DBUILD_UNITTESTS=OFF -DBUILD_EXAMPLES-OFF .. ./bin/benchmarks/lib-benchmark ......
In https://github.com/google/heir/pull/2425 I ported the OpenFHE benchmark script to compare it with the bazel build, and it... surprisingly has good runtime ``` bazel run -c opt //benchmark:openfhe_benchmark ... CKKSrns_Add 28.0...
The crypto config seems to be part of the problem here. The [benchmark script](https://github.com/openfheorg/openfhe-development/blob/main/benchmark/src/lib-benchmark.cpp#L78-L80) uses only 8 slots via the `SetBatchSize` config (not sure if this implies there is replication...