leiwen83
leiwen83
@hwu36 Hi, Could you help review this patch, whether this singlestage patch could be merged? Thx, Lei
> @leiwen83 can you provide performance results to justify this PR? Seem to me there is no performance gain for this single stage conv being added. So maybe we could...
V1.0.4 also has problem,, but it would meet error after several op test. ``` star:/data/cnn # ./mace_cc_benchmark Benchmark Time(ns) Iterations Input(MB/s) GMACPS -------------------------------------------------------------------------------------------------------------- MACE_BM_ADDN_2_1_128_128_32_float_CPU 1321442 720 3174.04 0.00 MACE_BM_ADDN_2_1_256_256_32_float_CPU 5026605...
> @leiwen83 Could you use the master branch? we now work on the master, thank you first master branch cannot even run. The first comment is tested over master branch...
Hi @daadaada , I see current triton is using inline ptx assembly for the codegen. As your previous work on gas and turingas prove that sass level is more effective,...
Could we switch to some other repo like https://huggingface.co/nvidia/Llama3-ChatQA-1.5-8B? Meta-llama is a gated repo, I think many people may have no permission to get access, including me...
> Why was it not covered by existing tests? It is for current ngram still use draft model set as target model to get some info like vocab size. In...
cc @cadedaniel
> will take a look Monday. btw, how is this different from system efficiency metric? (boost ratio == num_spec_tokens+1 * system efficiency?) the new boost_ratio would express more accurate expression...
@cadedaniel @robertgshaw2-neuralmagic Any comment for the latest PR change? :)