mengllm
mengllm
Hi, mllm-qnn can work on my device oppo findx7 ultra(snapdragon 8gen 3+16G RAM). However, the prefill speed for Qwen1.5-1.8B is approximately 4-6 tokens per second, which significantly diverges from the...
@and-ivanov @benrothen Hi, Regardless of whether I use `generated.bin` or `extracted.bin,` and whether I use `checksum_kernel` or `checksum_kernel_from_data,` the device’s checksum result changes with each execution of `cuLaunchKernel.` Only the...
@and-ivanov @benrothen Hi, The verification succeeds in the ‘test_generated with SMC’ test case, but it always fails in the ‘run_generated with SMC’ test case. My test hardware environment includes an...