萧停云

Results 5 issues of 萧停云

### 请提出你的问题 Please ask your question ![image](https://user-images.githubusercontent.com/51204375/172295697-5e5b86a0-25b5-4b30-8fc8-0a7a2bbd456a.png)

status/new-issue
type/question

if not data_args.streaming: lm_datasets = tokenized_datasets.map( group_texts, batched=True, batch_size=group_batch_size, num_proc=data_args.preprocessing_num_workers, load_from_cache_file=not data_args.overwrite_cache, desc=f"Grouping texts in chunks of {block_size}", ) funetuner.py中group_texts方法,在处理最后一个batch的时候卡住,进度条一直停在百分之90多 ![image](https://user-images.githubusercontent.com/51204375/230752796-3d2993a0-fed9-47fc-a1d4-3ee3ff2622bd.png)

使用 python lightseq/examples/inference/python/export/huggingface/hf_bart_export.py时报错 RuntimeError: Error building extension 'lightseq_layers_new' pytorch版本1.12.1,cuda10.2,cudnn8.4.3

请问在微调和增量训练阶段,数据集的格式是以下哪种格式呢? 1、 input:我和你 label:我和你 2、 input:bos_token我和你eos_token label:bos_token我和你eos_token 我看微调的代码数据集格式GPT2QADataset是1这种情况?不使用bos和eos吗? 希望您能解答我的疑惑

https://www.dropbox.com/s/dytqaqngaupp884/contriever_msmarco_index.tar.gz How was the index of this link established, Flat or other methods