JIMMY ZHAO
JIMMY ZHAO
Would you mind specifying the details?
> Hi @zhimin-z thanks for the issue! Currently the RAPIDS pip packages only support CUDA 11.x, so that is very likely the issue you're facing. What can I do now?I...
> ```python > # Clear any leftover memory from previous models > torch.cuda.set_device('cuda:3') > torch.cuda.empty_cache() > ``` > > Try to remove these code, and run in a fresh new...
> export CUDA_VISIBLE_DEVICES=5 If I set `export CUDA_VISIBLE_DEVICES=5` and use the following code: ``` from vllm import LLM, SamplingParams prompts = [ "Hello, my name is", "The president of the...
> Your system driver or any other program takes too much memory: > > > Device 0 [NVIDIA A100 80GB PCIe] PCIe GEN 3@16x RX: 0.000 KiB/s TX: 0.000 KiB/s...
It is mysterious good now.
> 你好,我想问一下,为什么书中的梯度下降算法将T轮迭代的均值作为输出?实际中不是以wT作为最终结果吗?  感谢你的提问 @pppooo332 ,这是因为在凸函数的梯度下降时,我们设定的步长$\eta$是启发式的,因此每次迭代产生的$\omega'$无法保证是局部最优解。考虑到定理7.1的结论,$T$轮迭代的$\omega$均值具有次线性收敛率,而我们却无法证明最后一次迭代值$\omega_T$也具有与之相较的收敛率。总之,返回$\omega$的均值可能会提高计算的代价,但却可以确保稳定的收敛率。该思想在7.3.1和7.3.2中梯度下降算法中亦有体现。 作为对比,在7.2.2中强凸函数的梯度下降算法中,我们只输出了最后一次迭代值$\omega_T$。这是因为在强凸函数的条件下,每次迭代的梯度更新均有闭式解:$\omega_{t+1}=\omega_t-\frac{1}{\gamma}\nabla f(\omega_t)$。每次迭代无需任何启发式算法就可以得到该临域的全局最优解,这也是此算法拥有更快收敛率(线性收敛率)的原因。因而,无需返回历史$\omega$的均值。
这个章节的两张图最好换成清晰度更高且没有版权之争的
这个章节的三张图最好换成清晰度更高且没有版权之争的
> Hi [axsaucedo](https://github.com/axsaucedo) and [zhimin-z](https://github.com/zhimin-z). I just want to give some updates regarding the RecSys section. Here is the list of frameworks that could potentially be included in such a...