oneflow
oneflow copied to clipboard
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
WIP, do not merge. If possible, do not waste resources compiling it, I make sure there is no compiling error before commit & push.
https://github.com/Oneflow-Inc/OneTeam/issues/1733 使用方式: ``` cu_seqlens_q = flow.arange( 0, (batch_size + 1) * seqlen_q, step=seqlen_q, dtype=flow.int32, device="cuda" ) cu_seqlens_v = flow.arange( 0, (batch_size + 1) * seqlen_k, step=seqlen_k, dtype=flow.int32, device="cuda" ) out,...
### op graph init 是 pass 的主要开销 参见:https://github.com/Oneflow-Inc/libai/issues/407#issuecomment-1286776427 其内部各部分开销如下 ``` Maybe OpGraph::Init(const Job& job) { auto cost_ct = std::make_unique(true, true); InitNodes(job); cost_ct->Count("OpGraph0", 1); op_name2op_node_.reserve(job.net().op_size()); ForEachNode([&](OpNode* node) { CHECK(op_name2op_node_.emplace(node->op().op_name(), node).second) Count("OpGraph1",...
## Summary flow.Tensor 包数据存在误差 ## Code to reproduce bug ```python import oneflow as flow flow.manual_seed(987342) for i in range(5): n, in_c, out_c = flow.randint(1, 500, (3,)).tolist() ops = n *...
TODO: - [x] 实现 helper functions - [x] 最 naive 的 case:native_multi_head_attention 在 functor 层对齐并实现 - [x] 实现 python functional 层的 mha - [x] 实现 nn.Module 层的 mha - [x]...
Let auto parallel give the fastest strategy under the limitation of memory.
## Summary - It raises: ``` NotImplementedError: nn.Graph.build()'s input/output item only support types: Tensor/None. ``` - Not helpful. It would be much useful if the type not supported or index...
## Summary - After some modification of model implementation. denoise unet of stable diffusion can be build as a `nn.graph`. - It has around 76000 ops. - Just to build...