Li Xinqi
Li Xinqi
感谢反馈问题。我们调查一下。 @clackhan
可以附上优化效果吗?
这个pr需要精简一下,不需要考虑main线程直接进入vm内部的这个特性。只需要考虑对cuda_tensor.numpy的支持就行。
> 这个是pytorch框架,nsys的结果: > > Using report1.sqlite export for stats reports. Exporting [/opt/nvidia/nsight-systems/2020.4.3/target-linux-x64/reports/cudaapisum.py report1.sqlite] to console... > > Time(%) Total Time (ns) Num Calls Average Minimum Maximum Name > > ```...
summary主要是为了看具体某个kernel优化前后对比。如果想看和pytorch的对比,最好先看nsys的qdrep文件。
从这里https://oneflow-test.oss-cn-beijing.aliyuncs.com/NeuS/nsys/report1.qdrep 可以看到  cuda kernel之间应该有很多cpu op。也许是某处代码直接写了cpu device type。
@yoonlee888 跑性能测试的时候可以少跑写iter。否则文件太大了
> 可以在四卡的机器上 本地跑一下 > > `bash dev/model_test.sh` > > 测试一下是否可以跑通模型的测试 @ouyangyu
Luyang Zhao [21 hours ago](https://of-world.slack.com/archives/G48PTMA15/p1667823974459879?thread_ts=1667557398.571019&cid=G48PTMA15) 1.profiler_off编译 [https://oneflow-static.oss-cn-beijing.aliyuncs.com/disco_diffusion/nvidia-profile/resize_profile/s[…]ofile_off_flow_221107_master%40252ccea.nsys-rep](https://oneflow-static.oss-cn-beijing.aliyuncs.com/disco_diffusion/nvidia-profile/resize_profile/skip_augsFalse_vision_resizeFalse_2022110706_profile_off_flow_221107_master%40252ccea.nsys-rep) [https://oneflow-static.oss-cn-beijing.aliyuncs.com/disco_diffusion/nvidia-profile/resize_profile/s[…]f_flow_221107_profiling_item%4021763eb.nsys-rep](https://oneflow-static.oss-cn-beijing.aliyuncs.com/disco_diffusion/nvidia-profile/resize_profile/skip_augsFalse_vision_resizeFalse_2022110706_profile_off_flow_221107_profiling_item%4021763eb.nsys-rep) 2.profiler_on编译 [https://oneflow-static.oss-cn-beijing.aliyuncs.com/disco_diffusion/nvidia-profile/resize_profile/s[…]rofile_on_flow_221107_master%40252ccea.nsys-rep](https://oneflow-static.oss-cn-beijing.aliyuncs.com/disco_diffusion/nvidia-profile/resize_profile/skip_augsFalse_vision_resizeFalse_2022110708_profile_on_flow_221107_master%40252ccea.nsys-rep) [https://oneflow-static.oss-cn-beijing.aliyuncs.com/disco_diffusion/nvidia-profile/resize_profile/s[…]n_flow_221107_profiling_item%4021763eb.nsys-rep](https://oneflow-static.oss-cn-beijing.aliyuncs.com/disco_diffusion/nvidia-profile/resize_profile/skip_augsFalse_vision_resizeFalse_2022110707_profile_on_flow_221107_profiling_item%4021763eb.nsys-rep) (edited) Luyang Zhao [20 hours ago](https://of-world.slack.com/archives/G48PTMA15/p1667830025971889?thread_ts=1667557398.571019&cid=G48PTMA15) 晚上机器比较稳定的时候,又基于disco跑了多次完整测试: profiling_item 3min28s、3min33s、3min32s master 3min52s、3min51s、3min51s 看起来是有明显加速的 [@lixinqi0703106](https://of-world.slack.com/team/U5DSW18TZ) ,大概提速20s左右:weisuo: Xinqi Li [18 hours ago](https://of-world.slack.com/archives/G48PTMA15/p1667835087355969?thread_ts=1667557398.571019&cid=G48PTMA15)...
> 基于本pr@commit:[b4b43eb](https://github.com/Oneflow-Inc/oneflow/commit/b4b43ebfe5fc98b82ad658a332ce9cabf291c22f) 测试disco,会有 > ```shell > Steps: 42%|███████████████████████████████████████ | 102/240 [02:22 Seed used: 2954719760 > Traceback (most recent call last): > File "disco.py", line 2617, in > do_run() > File...