oneflow issues

OFCCL: semi-dynamic and asynchronous collective communication scheduling.

1

WIP, do not merge. If possible, do not waste resources compiling it, I make sure there is no compiling error before commit & push.

Panlichen

Add cache for eager amp parameter

marigoold

support flash attention

4

https://github.com/Oneflow-Inc/OneTeam/issues/1733 使用方式： ``` cu_seqlens_q = flow.arange( 0, (batch_size + 1) * seqlen_q, step=seqlen_q, dtype=flow.int32, device="cuda" ) cu_seqlens_v = flow.arange( 0, (batch_size + 1) * seqlen_k, step=seqlen_k, dtype=flow.int32, device="cuda" ) out,...

guo-ran

enhancement

op

OpGraph init cost too much

### op graph init 是 pass 的主要开销参见：https://github.com/Oneflow-Inc/libai/issues/407#issuecomment-1286776427 其内部各部分开销如下 ``` Maybe OpGraph::Init(const Job& job) { auto cost_ct = std::make_unique(true, true); InitNodes(job); cost_ct->Count("OpGraph0", 1); op_name2op_node_.reserve(job.net().op_size()); ForEachNode([&](OpNode* node) { CHECK(op_name2op_node_.emplace(node->op().op_name(), node).second) Count("OpGraph1",...

strint

bug

community

flow.Tensor 包数据存在误差

2

## Summary flow.Tensor 包数据存在误差 ## Code to reproduce bug ```python import oneflow as flow flow.manual_seed(987342) for i in range(5): n, in_c, out_c = flow.randint(1, 500, (3,)).tolist() ops = n *...

hhhfccz

bug

community

[Just for test]Rank task test

2

strint

Add nn.MultiHeadAttention

7

TODO: - [x] 实现 helper functions - [x] 最 naive 的 case：native_multi_head_attention 在 functor 层对齐并实现 - [x] 实现 python functional 层的 mha - [x] 实现 nn.Module 层的 mha - [x]...

marigoold

enhancement

documentation

api

python

Auto Parallel consider memory

Let auto parallel give the fastest strategy under the limitation of memory.

Yipeng1994

feature

graph

AutoParallel

`nn.graph` doesn't have helpful error message for incorrect arguments

## Summary - It raises: ``` NotImplementedError: nn.Graph.build()'s input/output item only support types: Tensor/None. ``` - Not helpful. It would be much useful if the type not supported or index...

jackalcooper

bug

community

`nn.graph` compilation takes too long when it is a large module

13

## Summary - After some modification of model implementation. denoise unet of stable diffusion can be build as a `nn.graph`. - It has around 76000 ops. - Just to build...

jackalcooper

bug

community

oneflow
oneflow copied to clipboard

Metadata

OFCCL: semi-dynamic and asynchronous collective communication scheduling.

Add cache for eager amp parameter

support flash attention

OpGraph init cost too much

flow.Tensor 包数据存在误差

[Just for test]Rank task test

Add nn.MultiHeadAttention

Auto Parallel consider memory

`nn.graph` doesn't have helpful error message for incorrect arguments

`nn.graph` compilation takes too long when it is a large module

← Metadata

Owner

Metadata

oneflow oneflow copied to clipboard

Metadata

← Metadata

Owner

Metadata

oneflow
oneflow copied to clipboard