Runyu Lu
Runyu Lu
#### 算法思路 堆排序`partial_sort`算法思路如下: 一个大小为n的array,我们要获得`top k`个最大(最小)的元素。 * 以array的前k个元素建立一个大小为k的小(大)根堆(使用自定义`heapify()`函数) * 遍历剩余`n-k`个元素与小(大)根堆的堆顶元素比较,如果比堆顶元素大(小)那么就会交换两者同时重新更新小(大)根堆,遍历结束后会获得`top k`个最大(最小)的元素,但是并不是按照严格的顺序来排序。 * 利用堆顶是最大(最小)的元素对这k个元素使用常规意义下的堆排序来依次获得严格降序(升序)的`top k` array. *以上所有操作均为inplace操作* #### 复杂度分析 * 时间复杂度 之前冒泡排序时间复杂度为`O(nk)`, 此处堆排序时间复杂度为`O((n-k)logk)`。 * 空间复杂度 空间复杂度相同,均为`inplace`操作并没有使用额外空间。
* Finish the merge of multi-elempack * Add some test samples for coverage * Please check the instancenorm: * #4062
* Add the avx512/avx/sse inrinsic for instancenorm
## 📌 Checklist before creating the PR - [x] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A concise...
Hi, onnxsim is a wonderfull onnx graph optimizer. I think it may be useful for the deploy of topformer! It could remove some unuseful ops in onnx graph and sometimes...
## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A...
## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A...
1. Optimize the data path: from `List->CPU Tensor->List->rpc_param->GPU Tensor` to `List->rpc_param->GPU Tensor` 2. Wrap the async forward only once 3. Only rank0 Worker runs the sampler and returns the return...