thinksee
thinksee
use local tokenizer model
fintune错误
Traceback (most recent call last): File "finetune_visualglm.py", line 170, in args = get_args(args_list) File "/root/miniconda3/lib/python3.8/site-packages/sat/arguments.py", line 417, in get_args initialize_distributed(args) File "/root/miniconda3/lib/python3.8/site-packages/sat/arguments.py", line 500, in initialize_distributed deepspeed.init_distributed( TypeError: init_distributed() got...
读取图片太慢了
 一秒一张图片
### Is your feature request related to a problem? Please describe. 1. Qlora是一种训练微调方法,通过这种方式可以在单个48G的GPU显卡上微调65B的参数模型,采用这种方式训练的模型可以保持16字节微调任务的性能。QLoRA通过冻结的int4量化预训练语言模型反向传播梯度到低秩适配器LoRA来实现微调。 2. 目前所提供的是6B的模型,一定程度上讲,可以不进行优化,但是针对后边比较大的模型,是需要进一步优化的,还有一个潜在的问题就是比6B大的模型其性能如何呢?这个是否有效果的对比呢。 ### Solutions 1. https://arxiv.org/pdf/2305.14314.pdf 2. https://github.com/feihuamantian/qlora 3. https://github.com/huggingface/transformers/pull/23479 ### Additional context _No response_
For torch.distributed users or loading model parallel models, set environment variables RANK, WORLD_SIZE and LOCAL_RANK. /root/miniconda3/lib/python3.8/site-packages/torch/nn/init.py:405: UserWarning: Initializing zero-element tensors is a no-op warnings.warn("Initializing zero-element tensors is a no-op") Traceback...
https://github.com/shenweichen/DeepCTR/blob/e8f4d818f9b46608bc95bb60ef0bb0633606b2f2/deepctr/models/sequence/din.py#L83,为何没有传入参数mask。