oneflow
oneflow copied to clipboard
Add stft op
一. 涉及到fft运算部分
- cpu kernel 底层
fft运算采用pocketfft_hdronly.h库(外部引入库文件) - cuda kernel 底层
fft运算采用 cuda 自带的cufft库 (只支持cuda 11.0以上版本) - 针对fft运算封装好对应的类,后续根据需要可进行扩展
二. stft 算子相关工作
- 未支持参数:return_complex
我想用这个分支做测试。但是编译时报了一些由 clang-tidy 检查出的错误导致编译不通过,大多是数据类型隐式转换的 warning。
智敏可以帮忙看看修改下吗。或者说这个问题是已知的,可以不修改代码的前提下,用其它方法绕过,编译成功?如果有,也请告诉我,谢谢。
/home/yaochi/oneflow/oneflow/core/functional/impl/math_functor.cpp: In member function ‘oneflow::Maybe<oneflow::one::Tensor> oneflow::one::functional::impl::StftFunctor::operator()(const std::shared_ptr<oneflow::one::Tensor>&, int64_t, oneflow::Optional<long int>, oneflow::Optional<long int>, const oneflow::Optional<oneflow::one::Tensor>&, bool, const string&, bool, bool, bool) const’:
/home/yaochi/oneflow/oneflow/core/functional/impl/math_functor.cpp:3181:62: error: narrowing conversion of ‘(int64_t)n_fft’ from ‘int64_t’ {aka ‘long int’} to ‘int’ [-Werror=narrowing]
3181 | JUST(view::AsStrided(input_tensor, {batch, n_frames, n_fft},
| ^~~~~
/home/yaochi/oneflow/oneflow/core/common/just.h:105:40: note: in definition of macro ‘__JustStackCheckWrapper__’
105 | #define __JustStackCheckWrapper__(...) __VA_ARGS__
| ^~~~~~~~~~~
/home/yaochi/oneflow/oneflow/core/functional/impl/math_functor.cpp:3181:9: note: in expansion of macro ‘JUST’
3181 | JUST(view::AsStrided(input_tensor, {batch, n_frames, n_fft},
| ^~~~
/home/yaochi/oneflow/oneflow/core/functional/impl/math_functor.cpp:3181:62: error: narrowing conversion of ‘n_fft’ from ‘const int64_t’ {aka ‘const long int’} to ‘int’ [-Werror=narrowing]
3181 | JUST(view::AsStrided(input_tensor, {batch, n_frames, n_fft},
| ^~~~~
/home/yaochi/oneflow/oneflow/core/common/just.h:105:40: note: in definition of macro ‘__JustStackCheckWrapper__’
105 | #define __JustStackCheckWrapper__(...) __VA_ARGS__
| ^~~~~~~~~~~
/home/yaochi/oneflow/oneflow/core/functional/impl/math_functor.cpp:3181:9: note: in expansion of macro ‘JUST’
3181 | JUST(view::AsStrided(input_tensor, {batch, n_frames, n_fft},
| ^~~~
/home/yaochi/oneflow/oneflow/core/functional/impl/math_functor.cpp:3182:61: error: narrowing conversion of ‘(new_hop_length * ((int64_t)strides.std::vector<int>::at(1)))’ from ‘int64_t’ {aka ‘long int’} to ‘int’ [-Werror=narrowing]
3182 | {strides.at(0), new_hop_length * strides.at(1), strides.at(1)}, 0));
| ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
/home/yaochi/oneflow/oneflow/core/common/just.h:105:40: note: in definition of macro ‘__JustStackCheckWrapper__’
105 | #define __JustStackCheckWrapper__(...) __VA_ARGS__
| ^~~~~~~~~~~
/home/yaochi/oneflow/oneflow/core/functional/impl/math_functor.cpp:3181:9: note: in expansion of macro ‘JUST’
3181 | JUST(view::AsStrided(input_tensor, {batch, n_frames, n_fft},
| ^~~~
/home/yaochi/oneflow/oneflow/core/functional/impl/math_functor.cpp:3182:61: error: narrowing conversion of ‘(new_hop_length * ((int64_t)strides.std::vector<int>::at(1)))’ from ‘int64_t’ {aka ‘long int’} to ‘int’ [-Werror=narrowing]
3182 | {strides.at(0), new_hop_length * strides.at(1), strides.at(1)}, 0));
| ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
/home/yaochi/oneflow/oneflow/core/common/just.h:105:40: note: in definition of macro ‘__JustStackCheckWrapper__’
105 | #define __JustStackCheckWrapper__(...) __VA_ARGS__
| ^~~~~~~~~~~
/home/yaochi/oneflow/oneflow/core/functional/impl/math_functor.cpp:3181:9: note: in expansion of macro ‘JUST’
3181 | JUST(view::AsStrided(input_tensor, {batch, n_frames, n_fft},
我想用这个分支做测试。但是编译时报了一些由 clang-tidy 检查出的错误导致编译不通过,大多是数据类型隐式转换的 warning。
智敏可以帮忙看看修改下吗。或者说这个问题是已知的,可以不修改代码的前提下,用其它方法绕过,编译成功?如果有,也请告诉我,谢谢。
/home/yaochi/oneflow/oneflow/core/functional/impl/math_functor.cpp: In member function ‘oneflow::Maybe<oneflow::one::Tensor> oneflow::one::functional::impl::StftFunctor::operator()(const std::shared_ptr<oneflow::one::Tensor>&, int64_t, oneflow::Optional<long int>, oneflow::Optional<long int>, const oneflow::Optional<oneflow::one::Tensor>&, bool, const string&, bool, bool, bool) const’: /home/yaochi/oneflow/oneflow/core/functional/impl/math_functor.cpp:3181:62: error: narrowing conversion of ‘(int64_t)n_fft’ from ‘int64_t’ {aka ‘long int’} to ‘int’ [-Werror=narrowing] 3181 | JUST(view::AsStrided(input_tensor, {batch, n_frames, n_fft}, | ^~~~~ /home/yaochi/oneflow/oneflow/core/common/just.h:105:40: note: in definition of macro ‘__JustStackCheckWrapper__’ 105 | #define __JustStackCheckWrapper__(...) __VA_ARGS__ | ^~~~~~~~~~~ /home/yaochi/oneflow/oneflow/core/functional/impl/math_functor.cpp:3181:9: note: in expansion of macro ‘JUST’ 3181 | JUST(view::AsStrided(input_tensor, {batch, n_frames, n_fft}, | ^~~~ /home/yaochi/oneflow/oneflow/core/functional/impl/math_functor.cpp:3181:62: error: narrowing conversion of ‘n_fft’ from ‘const int64_t’ {aka ‘const long int’} to ‘int’ [-Werror=narrowing] 3181 | JUST(view::AsStrided(input_tensor, {batch, n_frames, n_fft}, | ^~~~~ /home/yaochi/oneflow/oneflow/core/common/just.h:105:40: note: in definition of macro ‘__JustStackCheckWrapper__’ 105 | #define __JustStackCheckWrapper__(...) __VA_ARGS__ | ^~~~~~~~~~~ /home/yaochi/oneflow/oneflow/core/functional/impl/math_functor.cpp:3181:9: note: in expansion of macro ‘JUST’ 3181 | JUST(view::AsStrided(input_tensor, {batch, n_frames, n_fft}, | ^~~~ /home/yaochi/oneflow/oneflow/core/functional/impl/math_functor.cpp:3182:61: error: narrowing conversion of ‘(new_hop_length * ((int64_t)strides.std::vector<int>::at(1)))’ from ‘int64_t’ {aka ‘long int’} to ‘int’ [-Werror=narrowing] 3182 | {strides.at(0), new_hop_length * strides.at(1), strides.at(1)}, 0)); | ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~ /home/yaochi/oneflow/oneflow/core/common/just.h:105:40: note: in definition of macro ‘__JustStackCheckWrapper__’ 105 | #define __JustStackCheckWrapper__(...) __VA_ARGS__ | ^~~~~~~~~~~ /home/yaochi/oneflow/oneflow/core/functional/impl/math_functor.cpp:3181:9: note: in expansion of macro ‘JUST’ 3181 | JUST(view::AsStrided(input_tensor, {batch, n_frames, n_fft}, | ^~~~ /home/yaochi/oneflow/oneflow/core/functional/impl/math_functor.cpp:3182:61: error: narrowing conversion of ‘(new_hop_length * ((int64_t)strides.std::vector<int>::at(1)))’ from ‘int64_t’ {aka ‘long int’} to ‘int’ [-Werror=narrowing] 3182 | {strides.at(0), new_hop_length * strides.at(1), strides.at(1)}, 0)); | ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~ /home/yaochi/oneflow/oneflow/core/common/just.h:105:40: note: in definition of macro ‘__JustStackCheckWrapper__’ 105 | #define __JustStackCheckWrapper__(...) __VA_ARGS__ | ^~~~~~~~~~~ /home/yaochi/oneflow/oneflow/core/functional/impl/math_functor.cpp:3181:9: note: in expansion of macro ‘JUST’ 3181 | JUST(view::AsStrided(input_tensor, {batch, n_frames, n_fft},
已修改代码,可以拉取最新的分支试试 我这边本地用的是 gcc 编译,一些警告并没有当错误处理,所以编译能通过
Speed stats:
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.
Speed stats:
GPU Name: GeForce GTX 1080
❌ OneFlow resnet50 time: 139.6ms (= 13956.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 162.0ms (= 16202.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.16 (= 162.0ms / 139.6ms)
OneFlow resnet50 time: 84.7ms (= 8466.8ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 100.8ms (= 10083.8ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 100.8ms / 84.7ms)
OneFlow resnet50 time: 57.4ms (= 11474.3ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 88.9ms (= 17789.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.55 (= 88.9ms / 57.4ms)
OneFlow resnet50 time: 44.8ms (= 8965.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.0ms (= 13992.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.56 (= 70.0ms / 44.8ms)
OneFlow resnet50 time: 39.6ms (= 7916.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.0ms (= 13402.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.69 (= 67.0ms / 39.6ms)
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.
Speed stats:
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.
Speed stats:
GPU Name: GeForce GTX 1080
❌ OneFlow resnet50 time: 139.4ms (= 13943.9ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 159.5ms (= 15947.9ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.14 (= 159.5ms / 139.4ms)
OneFlow resnet50 time: 85.4ms (= 8537.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 101.4ms (= 10135.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 101.4ms / 85.4ms)
OneFlow resnet50 time: 57.9ms (= 11586.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 87.9ms (= 17573.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.52 (= 87.9ms / 57.9ms)
OneFlow resnet50 time: 45.7ms (= 9145.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.7ms (= 15744.7ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.72 (= 78.7ms / 45.7ms)
OneFlow resnet50 time: 41.0ms (= 8195.1ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.0ms (= 13607.1ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.66 (= 68.0ms / 41.0ms)
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9273/
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.
CI failed when running job: cuda-module. PR label automerge has been removed
Speed stats:
Speed stats:
Speed stats:
GPU Name: GeForce GTX 1080
❌ OneFlow resnet50 time: 139.4ms (= 13939.5ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 160.2ms (= 16017.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 160.2ms / 139.4ms)
OneFlow resnet50 time: 85.1ms (= 8508.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 111.0ms (= 11098.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.30 (= 111.0ms / 85.1ms)
OneFlow resnet50 time: 57.8ms (= 11554.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 88.2ms (= 17635.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.53 (= 88.2ms / 57.8ms)
OneFlow resnet50 time: 44.4ms (= 8874.3ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.9ms (= 14384.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.62 (= 71.9ms / 44.4ms)
OneFlow resnet50 time: 40.5ms (= 8090.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 66.4ms (= 13282.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.64 (= 66.4ms / 40.5ms)
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9273/