AITemplate
AITemplate copied to clipboard
Is there C++ API provided?
Thank for your project! When we want to deploy my model in c++ project, is there C++ API provided to deploy my model? We don't find any c++ api to use. if you could provide c++ api, we will appreciate it.
Yes Python runtime is just a simple wrapper of C++ runtime. You can use the generated model with C API directly. This note is helpful: https://github.com/facebookincubator/AITemplate/tree/main/static
And the Model class is showing how to use C API in python, which can take a reference of usage in C++:
https://github.com/facebookincubator/AITemplate/blob/main/python/aitemplate/compiler/model.py
Thank you! I try it and I find there is a large difference in using between python runtime and c++ runtime. If you could provided c++ runtime example, we will appreciate it.
Besides, when I want to test multi-instance model, I change the code in "AITemplate/examples/01_resnet-50/benchmark_ait.py", but get error. The place I changed: Line 76: mod = Model(os.path.join("./tmp", model_name, "test.so"),2). # add two runtime Line 89&97: "add num_threads=2, use_unique_stream_per_thread = True" I meet the error:
[./tmp/resnet50_64/conv2d_bias_relu_373.cu] Got cutlass error: Error Internal at: 154
[16:56:40] ./tmp/resnet50_64/model_interface.cu:158: Error: GraphInstantiate(&graph_exec, graph) API call failed: operation not permitted when stream is capturing at ./tmp/resnet50_64/model-generated.h, line3009
Traceback (most recent call last):
File "benchmark_ait_test.py", line 136, in
How shell we do to test multi-instance model?
cc @mikeiovine
This looks like a known issue with graph mode that should be fixed in the next release. For now I recommend setting graph_mode=False to test this case.
Hey @wanghuihhh, so I thought this was a separate issue, but it looks like something else entirely. The problem is that we were starting our stream captures with cudaStreamCaptureModeGlobal instead of cudaStreamCaptureModeThreadLocal, causing issues in the multi-threaded tests. I'll have a patch out to fix this particular issue soon.
As for the C++ runtime docs: I'll add a proper example. Thanks for the suggestion!
cc @antinucleon