Xiaoyu Xu comments

Results 200 comments of


                                            Xiaoyu Xu

Why is it getting slower when using nn.graph

> when I set debug(0), there is no special log printed on the screen Try to set debug(2)，logs are expected to show at the first call of nn.Graph. > when...

Why is it getting slower when using nn.graph

Is your code on github？We need to see the code to figure out what is happening.

Fused llama kernel

> 在创建这个对象时会统一申请全部计算所需内存，析构时统一释放内存，因为是纯c++计算且整个过程没有内存申请操作之前提到推理时有个动态 shape 的问题，它是取 max 去申请了内存么

Add global default device

这个 set/get_global_default_device - torch 有对应的么 - global 是什么含义了，因为我们有 global tensor，所以再使用 global 需要想清楚

另外 global tensor 相关的，之前我们提供了一个 global mode，看起来是一类功能（global mode 是给 global tensor 用的），可以综合考虑下 https://oneflow.readthedocs.io/en/master/generated/oneflow.utils.global_view.global_mode.html#oneflow.utils.global_view.global_mode

[Feature Request]: 显存池增加一个选项与torch集成

This idea is nice, I have heard of work like this: https://medium.com/rapids-ai/pytorch-rapids-rmm-maximize-the-memory-efficiency-of-your-workflows-f475107ba4d4 But this is not a bottleneck for the moment, we haven't seen tasks limited by this. So this...

[Feature Request]: Compile graph in parallel - 并行编译nn.Graph以加速整个推理流程

> it takes 5 minutes to compile a deep LSTM net Is this a single device task or multi-device task? Compilation time costs of each compilation stage can be shown...

[Feature Request]: Compile graph in parallel - 并行编译nn.Graph以加速整个推理流程

``` I20230425 00:48:47.266355 20443 cost_util.h:98] [count log]{"loc":"[GraphCompile]Graph_0 OptimizationLogicalGraph","mem_rss":"11621.000000 MB","time_cost":"433 seconds"} I20230425 00:49:50.931952 20443 cost_util.h:98] [count log]{"loc":"[GraphCompile]Graph_0 AlignStates","mem_rss":"11623.000000 MB","time_cost":"47 seconds"} I20230425 00:54:51.325140 20443 cost_util.h:98] [count log]{"loc":"[GraphCompile]Graph_0 CompleteJob","mem_rss":"11606.000000 MB","time_cost":"300 seconds"} ``` It...

[Bug][MT5] Throughput is unexpected

带 nccl logical op 和 sbp 的 op graph log：https://oneflow-test.oss-cn-beijing.aliyuncs.com/mt5_test/2n4g_log/output.log 搜索下`Operator` 可以找到 op graph 的起点。

Clean mt5

这个和之前那个分支的关系是什么，需要测评么；对性能有提升作用不。