Bangsheng Tang
Bangsheng Tang
## issue when running faster transformer with multiple threads, where each thread corresponds to a different GPU, and use_trt_kernels=true, kernel launch would throw CUDA_ERROR_INVALID_HANDLE. there will always be 1 out...
Summary: * Glow lowering * loadSlice: allow GlowIValueNode and 4-argument version * loadGetItem: newly added to support aten::__getitem__ * loadSub: allow two GlowIValueNodes as inputs * loadLen: to support aten::len...
## 📌 Description ## 🔍 Related Issues ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items...