flankedge

Results 6 issues of flankedge

``` Traceback (most recent call last): File "/usr/local/bin/paddle2onnx", line 10, in sys.exit(main()) File "/usr/local/lib/python3.6/dist-packages/paddle2onnx/command.py", line 142, in main enable_onnx_checker=args.enable_onnx_checker) File "/usr/local/lib/python3.6/dist-packages/paddle2onnx/command.py", line 114, in program2onnx enable_onnx_checker=enable_onnx_checker) File "/usr/local/lib/python3.6/dist-packages/paddle2onnx/convert.py", line 77,...

Enhancement

I am doing benchmark tests for UNet with AIT on A100/A10/T4 etc. tests on T4 have finished, it work well. However. - on A100 The build process stopped within profile...

I am using trtllm 0.8.0 (added moe support following llama's implementation). we serve models with trtllm_backend (docker images triton-trtllm-24.02) [qwen2-moe-57B-A14B](https://huggingface.co/Qwen/Qwen2-57B-A14B-Instruct) can run well on single `nvidia-A800`. But, if we run...

functionality issue

Firstly, thanks for open-sourcing your great model. The Kontext pipeline image editing ability is amazing. But the model pipeline is running too slow and is very memory-intensive. Ofcause you provided...

**Description** As the title describes **Triton Information** I'm using ngc `tritonserver-25.10` **To Reproduce** Model: resnet50 tensorrt_backend: ```protobuf name: "resnet_50" backend: "tensorrt" max_batch_size: 8 model_warmup: { name: "sample" batch_size: 1 inputs:...

**Description** When serving a TensorRT engine with CUDA graph optimization enabled, we encountered a weird phenomenon. We send requests sequentially, following the `AAAAABBBBBAAAABBBB` pattern. In every A(B)‘s requesting period, the...

bug