alphanlp
alphanlp
### 🐛 Describe the bug with strategy.model_init_context(): if args.model == 'gpt2': actor = GPTActor().cuda() critic = GPTCritic().cuda() ### Environment _No response_
Traceback (most recent call last): File "chinese_abstract.py", line 27, in model_revision='v1.0.1', File "/data/huap/software/miniconda3/envs/ms/lib/python3.7/site-packages/modelscope/pipelines/builder.py", line 141, in pipeline return build_pipeline(cfg, task_name=task) File "/data/huap/software/miniconda3/envs/ms/lib/python3.7/site-packages/modelscope/pipelines/builder.py", line 55, in build_pipeline cfg, PIPELINES, group_key=task_name, default_args=default_args)...
in 1.3.0 Version, ROM语义相关性 model predict results is random
中文LLaMA是通过lora来训练的?
d norm: nan | actual seqlen: 2048 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1.886 | TFLOPs: 78.46 | iteration 5426/...
### 🐛 Describe the bug File "/data/llmodel/miniconda3/envs/colossal/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/data/llmodel/miniconda3/envs/colossal/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/data/llmodel/huap/ColossalAI/applications/Colossal-LLaMA-2/colossal_llama2/utils/flash_attention_patch.py", line 133, in attention_forward cos,...
### Describe the feature can somebody give out the example of pretrian data format
instruct_chat_50k.json 部分数据中包含“继续”,怎么理解和使用?
the model of Pre-tokenized dataset openchat_v3.2_super.train.parquet is Llama2 or Mistral?