How to link custom ops?
Hi!
I'm trying to integrate some of quantized MatMul C++ kernels into Executorch and I'm having a bad time: the documentation is very vague about what exactly I need to include/link for ATen to pick up my ops.
I would greatly appreciate any help in trying to make it work.
Overview:
Source code for the dynamic library containing the ops consists of 3 files: lut_kernel.h, lut_kernel.cpp, lut_kernel_pytorch.cpp. The files contain roughly this code:
// lut_kernel.h
#pragma once
#include <executorch/runtime/kernel/kernel_includes.h>
namespace torch {
namespace executor {
namespace native {
Tensor& code2x8_lut_matmat_out(
RuntimeContext& ctx,
const Tensor& input,
const Tensor& codes,
const Tensor& codebooks,
const Tensor& scales,
const optional<Tensor>& bias,
Tensor& out
);
} // namespace native
} // namespace executor
} // namespace torch
// lut_kernel.cpp
#include "lut_kernel.h"
#include <executorch/extension/kernel_util/make_boxed_from_unboxed_functor.h>
namespace torch {
namespace executor {
namespace native {
Tensor& code2x8_lut_matmat_out(
RuntimeContext& ctx,
const Tensor& input,
const Tensor& codes,
const Tensor& codebooks,
const Tensor& scales,
const optional<Tensor>& bias,
Tensor& out
) {
// CALCULATIONS
return out;
}
} // namespace native
} // namespace executor
} // namespace torch
EXECUTORCH_LIBRARY(aqlm, "code2x8_lut_matmat.out", torch::executor::native::code2x8_lut_matmat_out);
// lut_kernel_pytorch.cpp
#include "lut_kernel.h"
#include <executorch/extension/aten_util/make_aten_functor_from_et_functor.h>
#include <executorch/extension/kernel_util/make_boxed_from_unboxed_functor.h>
#include <torch/library.h>
namespace torch {
namespace executor {
namespace native {
Tensor& code2x8_lut_matmat_out_no_context(
...
Tensor& output
) {
void* memory_pool = malloc(10000000 * sizeof(uint8_t));
MemoryAllocator allocator(10000000, (uint8_t*)memory_pool);
exec_aten::RuntimeContext context{nullptr, &allocator};
return torch::executor::native::code2x8_lut_matmat_out(
context,
...,
output
);
}
at::Tensor code2x8_lut_matmat(
...
) {
auto sizes = input.sizes().vec();
sizes[sizes.size() - 1] = codes.size(1) * codebooks.size(2);
auto out = at::empty(sizes,
at::TensorOptions()
.dtype(input.dtype())
.device(input.device())
);
WRAP_TO_ATEN(code2x8_lut_matmat_out_no_context, 5)(
...,
out
);
return out;
}
} // namespace native
} // namespace executor
} // namespace torch
TORCH_LIBRARY(aqlm, m) {
m.def(
"code2x8_lut_matmat(Tensor input, Tensor codes, "
"Tensor codebooks, Tensor scales, Tensor? bias=None) -> Tensor"
);
m.def(
"code2x8_lut_matmat.out(Tensor input, Tensor codes, "
"Tensor codebooks, Tensor scales, Tensor? bias=None, *, Tensor(c!) out) -> Tensor(c!)"
);
}
TORCH_LIBRARY_IMPL(aqlm, CompositeExplicitAutograd, m) {
m.impl(
"code2x8_lut_matmat", torch::executor::native::code2x8_lut_matmat
);
m.impl(
"code2x8_lut_matmat.out",
WRAP_TO_ATEN(torch::executor::native::code2x8_lut_matmat_out_no_context, 5)
);
}
, which closely follows the executorch custom sdpa code.
I build it as two standalone dynamic libs: one lut_kernel.cpp with dependency only on executorch and lut_kernel_pytorch.cpp with additional torch dependency. I load the latter lib into pytorch as torch.ops.load_library(f"../libaqlm_bindings.dylib").
The problem:
I wrote a small nn.Module that basically just calls the op. In pytorch it works well. aten_dialect for it looks like this:
ExportedProgram:
class GraphModule(torch.nn.Module):
def forward(self, p_codes: "i8[3072, 128, 2]", p_codebooks: "f32[2, 256, 1, 8]", p_scales: "f32[3072, 1, 1, 1]", p_bias: "f32[3072]", input: "f32[s0, s1, 1024]"):
input_1 = input
# File: [/Users/blacksamorez/reps/AQLM/inference_lib/src/aqlm/inference.py:74](https://file+.vscode-resource.vscode-cdn.net/Users/blacksamorez/reps/AQLM/inference_lib/src/aqlm/inference.py:74) in forward, code: return torch.ops.aqlm.code2x8_lut_matmat(
code2x8_lut_matmat: "f32[s0, s1, 1024]" = torch.ops.aqlm.code2x8_lut_matmat.default(input_1, p_codes, p_codebooks, p_scales, p_bias); input_1 = p_codes = p_codebooks = p_scales = p_bias = None
return (code2x8_lut_matmat,)
Graph signature: ExportGraphSignature(input_specs=[InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_codes'), target='codes', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_codebooks'), target='codebooks', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_scales'), target='scales', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_bias'), target='bias', persistent=None), InputSpec(kind=<InputKind.USER_INPUT: 1>, arg=TensorArgument(name='input'), target=None, persistent=None)], output_specs=[OutputSpec(kind=<OutputKind.USER_OUTPUT: 1>, arg=TensorArgument(name='code2x8_lut_matmat'), target=None)])
Range constraints: {s0: VR[1, 9223372036854775806], s1: VR[1, 9223372036854775806]}
But when calling to_edge I get an error saying that Operator torch._ops.aqlm.code2x8_lut_matmat.default is not Aten Canonical.
I don't conceptually understand how the EXECUTORCH_LIBRARY macro from lut_kernel.cpp supposed to make it Aten Canonical. Should I somehow recompile executorch to include my kernel?
Thank you!
I added compile_config=EdgeCompileConfig(_check_ir_validity=False) to to_edge and it appears to be exporting now.
Linking libaqlm.dylib to executor_runner (and replacing executorch with executorch_no_prim_ops in it's libs) I'm able to compile it.
However, running it, I'm encountering an error that goes like this:
E 00:00:00.001621 executorch:method.cpp:536] Missing operator: [0] aqlm::code2x8_lut_matmat.out
E 00:00:00.001623 executorch:method.cpp:724] There are 1 instructions don't have corresponding operator registered. See logs for details
I'm on executorch v0.3.0.
@larryliu0820 any suggestions?
@digantdesai Hi! Thanks for the reply. I think we shifted the discussion to #4719 . In light of that, I'm closing this issue.