ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

Tensor并行的作用范围?

Open bobo0810 opened this issue 2 years ago • 2 comments

Discussed in https://github.com/hpcaitech/ColossalAI/discussions/3156

Originally posted by bobo0810 March 17, 2023 对于conv、linear等基础算子,官方列表是否可以清晰列出 Tensor并行 的生效范围呢?

bobo0810 avatar Mar 17 '23 08:03 bobo0810

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Title: Tensor parallel scope?

Issues-translate-bot avatar Mar 17 '23 08:03 Issues-translate-bot

Hi @bobo0810 Thanks for the suggestion, we will add it in the near future when we update the documentation.

binmakeswell avatar Mar 17 '23 09:03 binmakeswell

Discussed in #3156

Originally posted by bobo0810 March 17, 2023 对于conv、linear等基础算子,官方列表是否可以清晰列出 Tensor并行 的生效范围呢? 你好,我想请教一下,我们是不是只能使用 "from colossalai import nn as col_nn" 官方的这种形式定义模型才能使用张量并行呢?普通的torch.nn似乎并不在张量并行的生效范围。

Vvvvvvsysy avatar Mar 20 '23 07:03 Vvvvvvsysy

我也想问 假设 模型为timm_resnet50, 配置文件配置了Tensor并行。训练过程中 模型的Tensor并行是否生效 及 哪些算子生效?

bobo0810 avatar Mar 20 '23 07:03 bobo0810

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


I also want to ask Suppose the model is timm_resnet50, and the configuration file configures Tensor parallelism. During the training process, does the Tensor parallelism of the model take effect and which operators take effect?

Issues-translate-bot avatar Mar 20 '23 07:03 Issues-translate-bot

我也想问 假设 模型为timm_resnet50, 配置文件配置了Tensor并行。训练过程中 模型的Tensor并行是否生效 及 哪些算子生效?

从我的实验结果来看的话,使用torch.nn配置张量并行并没有达到降低显存消耗的效果,应该是要把模型的结构替换成官方的定义形式。

Vvvvvvsysy avatar Mar 23 '23 01:03 Vvvvvvsysy

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


I also want to ask. Suppose the model is timm_resnet50, and the configuration file is configured with Tensor parallelism. During the training process, does the Tensor parallelism of the model take effect and which operators take effect?

From my experimental results, using torch.nn to configure tensor parallelism does not achieve the effect of reducing memory consumption. It should be to replace the structure of the model with the official definition form.

Issues-translate-bot avatar Mar 23 '23 01:03 Issues-translate-bot