RyanOvO

Results 6 comments of RyanOvO

> 您是指在线的dynamic batch的推理吗,如果是的话,需要有服务端组batch,如果是离线的,直接提升batch size即可。 是的。类似dynamic batch,就类似VLLM那样的。

> > 同问,想知道预训练数据集的格式 另外我在仓库里看到了预训练脚本的命令行里dataset是wiki_demo,是不是就是表示数据集是data/wiki_demo.txt,下面是我看到的 ![image](https://private-user-images.githubusercontent.com/18082104/312683089-db02f787-cce0-441a-8265-83422936aaf2.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTA0MDk5MDAsIm5iZiI6MTcxMDQwOTYwMCwicGF0aCI6Ii8xODA4MjEwNC8zMTI2ODMwODktZGIwMmY3ODctY2NlMC00NDFhLTgyNjUtODM0MjI5MzZhYWYyLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDAzMTQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwMzE0VDA5NDY0MFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTkwYWI5MWM5N2ZlZGM3NDJmNmEwZDRhYzkxMDFiYjk4MWFkZDFkNzcyNmIyYjY1ZDA5YTQ1Mjc3OTdhZjNlMDEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.7_gtrsCqbgf5RE_vPjP6vhShxfbQd5lRKUy3YE5YRuI) ![image](https://private-user-images.githubusercontent.com/18082104/312683171-7463c59f-c7af-4501-98fe-b61b0bf52678.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTA0MDk5MDAsIm5iZiI6MTcxMDQwOTYwMCwicGF0aCI6Ii8xODA4MjEwNC8zMTI2ODMxNzEtNzQ2M2M1OWYtYzdhZi00NTAxLTk4ZmUtYjYxYjBiZjUyNjc4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDAzMTQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwMzE0VDA5NDY0MFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWZjNjk4NjdiMmJiNTBjNjhlZTFhZjczYTAwM2IwMzc3NGI3NjAyMmQ1ZWU2NDgxMWYzZmQ4ZWE0YjZhMzdhZmYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.e8ngqzaKZaVN63sm8TmlQ9B7NYABwdAs_EWyvDmLXGk) > > 是的,具体信息在dataset_info.json里 麻烦问下,增量预训练数据格式的配置流程是不是如下这样: 1. dataset_info的配置: "wiki_demo": { "file_name": "wiki_demo.txt", "file_sha1": "e70375e28eda542a90c68213640cc371898ce181", "columns": { "prompt": "text" } } 2. 文本数据格式为txt,不需要多余的key标记,只要有换行符\n就行

> > > > 同问,想知道预训练数据集的格式 另外我在仓库里看到了预训练脚本的命令行里dataset是wiki_demo,是不是就是表示数据集是data/wiki_demo.txt,下面是我看到的 ![image](https://private-user-images.githubusercontent.com/18082104/312683089-db02f787-cce0-441a-8265-83422936aaf2.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTA0MDk5MDAsIm5iZiI6MTcxMDQwOTYwMCwicGF0aCI6Ii8xODA4MjEwNC8zMTI2ODMwODktZGIwMmY3ODctY2NlMC00NDFhLTgyNjUtODM0MjI5MzZhYWYyLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDAzMTQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwMzE0VDA5NDY0MFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTkwYWI5MWM5N2ZlZGM3NDJmNmEwZDRhYzkxMDFiYjk4MWFkZDFkNzcyNmIyYjY1ZDA5YTQ1Mjc3OTdhZjNlMDEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.7_gtrsCqbgf5RE_vPjP6vhShxfbQd5lRKUy3YE5YRuI) ![image](https://private-user-images.githubusercontent.com/18082104/312683171-7463c59f-c7af-4501-98fe-b61b0bf52678.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTA0MDk5MDAsIm5iZiI6MTcxMDQwOTYwMCwicGF0aCI6Ii8xODA4MjEwNC8zMTI2ODMxNzEtNzQ2M2M1OWYtYzdhZi00NTAxLTk4ZmUtYjYxYjBiZjUyNjc4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDAzMTQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwMzE0VDA5NDY0MFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWZjNjk4NjdiMmJiNTBjNjhlZTFhZjczYTAwM2IwMzc3NGI3NjAyMmQ1ZWU2NDgxMWYzZmQ4ZWE0YjZhMzdhZmYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.e8ngqzaKZaVN63sm8TmlQ9B7NYABwdAs_EWyvDmLXGk) > > > > > > > > > 是的,具体信息在dataset_info.json里 > > > > > > 麻烦问下,增量预训练数据格式的配置流程是不是如下这样: > > > > 1....

> 我们的数据集已经发布,推荐使用[https://github.com/hiyouga/LLaMA-Factory进行微调,](https://github.com/hiyouga/LLaMA-Factory%E8%BF%9B%E8%A1%8C%E5%BE%AE%E8%B0%83%EF%BC%8C) 可以把数据集下载到本地,然后添加到data目录下,并参考[https://github.com/hiyouga/LLaMA-Factory/blob/main/data/README_zh.md进行设置,即可进行微调,](https://github.com/hiyouga/LLaMA-Factory/blob/main/data/README_zh.md%E8%BF%9B%E8%A1%8C%E8%AE%BE%E7%BD%AE%EF%BC%8C%E5%8D%B3%E5%8F%AF%E8%BF%9B%E8%A1%8C%E5%BE%AE%E8%B0%83%EF%BC%8C) 我们预计在2024.07、08发布适配各类开源基座模型的微调版本以及微调代码。 挂的链接都访问不了

> 我们的数据集已经发布,推荐使用[https://github.com/hiyouga/LLaMA-Factory进行微调,](https://github.com/hiyouga/LLaMA-Factory%E8%BF%9B%E8%A1%8C%E5%BE%AE%E8%B0%83%EF%BC%8C) 可以把数据集下载到本地,然后添加到data目录下,并参考[https://github.com/hiyouga/LLaMA-Factory/blob/main/data/README_zh.md进行设置,即可进行微调,](https://github.com/hiyouga/LLaMA-Factory/blob/main/data/README_zh.md%E8%BF%9B%E8%A1%8C%E8%AE%BE%E7%BD%AE%EF%BC%8C%E5%8D%B3%E5%8F%AF%E8%BF%9B%E8%A1%8C%E5%BE%AE%E8%B0%83%EF%BC%8C) 我们预计在2024.07、08发布适配各类开源基座模型的微调版本以及微调代码。 请问微调版本发布了么?