oneflow Can oneflow do split-convolution operation ?

Summary

I am wondering if oneflow support this kind of operations. For example, I have an input tensor of [1, 3, 200, 200] ( [batch_size, channel, width, height] ) , a conv2d layer, and 2 devices. I want device0 calculate the conv-op on [1, 3, 100, 200] tensor and device1 calculate the conv-op on another [1, 3, 100, 200].

Code to reproduce bug

My demo is as follows. I think it doesn't work and maybe the convolution operations do not support flow.sbp.split(2). Can oneflow support such operation? I'm appreciated if someone answers it.

import oneflow as flow
from oneflow import nn

class SimpleConv(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, 3, 1, 1, bias=False)

    def forward(self, x):
        out = self.conv1(x)
        return out


if __name__ == '__main__':
    placement = flow.placement("cuda", [0, 1])
    model= SimpleConv()
    model_global = model.to_global(placement=placement, sbp=flow.sbp.broadcast)
    
    x1 = flow.ones(1, 3, 100, 200)
    x1_global = x1.to_global(placement=placement, sbp=flow.sbp.split(2))

    out_global = model_global(x1_global)
    if flow.cuda.current_device() == 0:
        print(out_global)
        local_out = out_global.to_local()
        print(f'{local_out.device}, {local_out.shape}, \n{local_out}')

System Information

What is your OneFlow installation (pip, source, dockerhub): dockerhub (oneflowinc/oneflow:nightly-cuda11.2)
OS: ubuntu 18.04
OneFlow version (run python3 -m oneflow --doctor): 0.8.1.dev20220906+cu112
Python version: 3.8.13
CUDA driver version: 11.2
GPU models:
Other info:

Sep 21 '22 13:09 DarrenYing

Currently, the conv op only support splitting data in the first dimension, which is the batch dimension. The flow.sbp.split(2) for the input of convolution is invalid.

Sep 21 '22 14:09 shangguanshiyuan

Currently, the conv op only support splitting data in the first dimension, which is the batch dimension. The flow.sbp.split(2) for the input of convolution is invalid.

Thanks you. And I am still wondering whether this kind of operation can be achieved by using oneflow.comm or current features?

Sep 21 '22 14:09 DarrenYing

There are complex effects on the boundary, when you split data on the height or width dimension in convolution. But there is no obvious benefit. It will be more complicated with different stride, padding, and dilation. This method slows down computation or wastes more memory sometimes. It is not supported now. Can you tell us what is the benefit of this method or where is this method needed?

Sep 21 '22 15:09 shangguanshiyuan

There are complex effects on the boundary, when you split data on the height or weight dimension in convolution. But there is no obvious benefit. It will be more complicated with different stride, padding, and dilation. This method slows down computation or wastes more memory sometimes. It is not supported now. Can you tell us what is the benefit of this method or where is this method needed?

You're right. It is complex on the boundary and I plan to handle the boundary manually. I want to use this method to save memory when the size of input image is big and one device cannot handle the training process when batch-size is 1. I want to verify how much memory can be saved when the input image is very large and the model has many conv2d layers.

Sep 22 '22 01:09 DarrenYing

Thanks for your advice. I will try it.

------------------ 原始邮件 ------------------ 发件人: "Oneflow-Inc/oneflow" @.>; 发送时间: 2022年9月22日(星期四) 上午10:15 @.>; @.@.>; 主题: Re: [Oneflow-Inc/oneflow] Can oneflow do split-convolution operation ? (Issue #9124)

maybe you can try to split input image and weight in Channels(split=1), and its output need partial sum? import numpy as np import oneflow as flow import oneflow.nn as nn np_x = np.random.randn(4, 16, 8, 8).astype(np.float32) np_split_x = np.split(np_x, 2, axis=1) np_conv_weight = np.random.randn(32, 16, 3, 3).astype(np.float32) np_split_conv_weight = np.split(np_conv_weight, 2, axis=1) x_tensor = flow.tensor(np_x) conv = nn.Conv2d(16, 32, kernel_size=3, stride=1, bias=False) conv.weight = nn.Parameter(flow.tensor(np_conv_weight)) conv_out = conv(x_tensor) split_x_tensor1 = flow.tensor(np_split_x[0]) split_x_tensor2 = flow.tensor(np_split_x[1]) split_conv1 = nn.Conv2d(8, 32, kernel_size=3, stride=1, bias=False) split_conv2 = nn.Conv2d(8, 32, kernel_size=3, stride=1, bias=False) split_conv1.weight = nn.Parameter(flow.tensor(np_split_conv_weight[0])) split_conv2.weight = nn.Parameter(flow.tensor(np_split_conv_weight[1])) split_conv1_out = split_conv1(split_x_tensor1) split_conv2_out = split_conv2(split_x_tensor2) split_conv_sum = split_conv1_out + split_conv2_out print(np.allclose(conv_out.numpy(), split_conv_sum.numpy(), atol=1e-4, rtol=1e-4))

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Sep 22 '22 06:09 DarrenYing

oneflow oneflow copied to clipboard

Can oneflow do split-convolution operation ?

Summary

Code to reproduce bug

System Information

oneflow
oneflow copied to clipboard