Jiarui Fang（方佳瑞） comments

Results 220 comments of


                                            Jiarui Fang（方佳瑞）

install issue

Hello, thanks for your feedback. Could you please post your error information?

install issue

We have also provided a docker image for a clean installation environment. Would you like to have a try inside the container? docker pull nvcr.io/nvidia/pytorch:21.06-py3

> 模型并行这里我认同您的观点。Mesh-Tensorflow 这样的方案会比较难推给用户，可能短期内只需要有 Megatron-LM 的那种针对 transformer 的切分方式就够了。 > > 不过最后的 chunk 那里的 1/n 我没太懂，是说每个参数都切分完放在各自进程内的 chunk 中，在使用的时候先 allgather 吗？是的，我觉得用zeroDP没必要用MP了。当然很多人不一定认同，还有很多人靠N-D并行吃饭

我们真的需要模型并行（MP）么？

加MP从实现角度肯定是没问题的，但是我们要考虑性价比的问题，在足够的人力和时间下，我们肯定也会把4D，5D都做出来。我们现在任务是要梳理出一个优先级，把精力优先投入到最优利于**用户使用**和**性能提升**的技术上。我觉得MP很鸡肋，我刚刚用DeepSpeed跑了一些实验结果可以观察6B mp2的数据，和DP对比，性能下降很严重【腾讯文档】PatrickStar_V100_profiling https://docs.qq.com/sheet/DYWhORUNtREd1aGt0

运行GPT2案例出现RuntimeError: Could not find 'SLURM_PROCID'问题，是必须要装SLURM环境？

加一下 --from_torch在启动命令args里。没加默认用slurm启动

[Compatibility] Runining OPT using PyTorch 1.12 and Gemini placement_policy = 'cuda' failed

I also tried placement_policy = 'cpu' It also crashed. The error stack is listed as follows 0%| | 0/444 [00:00

basic query example

After I switch to another computer. I met the following error in case of using `python example_client.py` 20-10-10:11:23:13 INFO [docker_container_manager.py:192] [default-cluster] Starting managed Redis instance in Docker 20-10-10:11:23:13 WARNING [decorators.py:34]...

Jiarui Fang（方佳瑞）

install issue

install issue

我们真的需要模型并行（MP）么？

我们真的需要模型并行（MP）么？

运行GPT2案例出现RuntimeError: Could not find 'SLURM_PROCID'问题，是必须要装SLURM环境？

[Compatibility] Runining OPT using PyTorch 1.12 and Gemini placement_policy = 'cuda' failed

basic query example

Why **2

gpt2推理结果不正确

gpt2推理结果不正确