oneflow issues

flow.cat 对其进行切片时内存泄漏

7

这是一个非常神奇的问题，因为这个问题困扰了我2天，出现这个问题的原因是这次数据集的准备和以往不同，由于拼接nerf光线的原因，导致必须采用flow.cat进行数据集的拼接。首先是oneflow的代码： ```python import oneflow as flow import oneflow.nn as nn A=[] for i in range(160000): A.append(flow.ones(100,11)*2) print(i) A = flow.cat(A,0) print(A.shape) A = flow.split(A,1,0) """ oneflow.Size([16000000, 11]) """ ```...

shaoshitong

bug

community

安装cu112的包，先import torch，后import oneflow，再使用cuda，会报错

4

``` import torch, oneflow x = oneflow.tensor(1).cuda() ``` 报错信息： ``` F20220819 12:38:12.392885 860212 cuda_stream.cpp:103] Check failed: cublasSetMathMode(cublas_handle_, CUBLAS_TF32_TENSOR_OP_MATH) : CUBLAS_STATUS_INVALID_VALUE (7) *** Check failure stack trace: *** @ 0x7f27f35d2dfa google::LogMessage::Fail()...

shangguanshiyuan

【NOT MERGE】Code example for user op implementation

EsdeathYZH

enhancement

eager

op

Can oneflow eager support amp eager mode ?

2

Hi, I want to train a neural network with oneflow eager amp mode, but I found oneflow can't support it. When will eager amp mode be supported? I don't want...

tiancaidagongrenwangdoudou

feature

TODO

good for pr

community

cuda和cpu上精度误差较大，tan_grad未注册

2

## Summary 目标是解决一个高精度的模拟计算任务，用神经网络拟合复杂方程的解。代码部分的工作包括：1.oneflow重写训练代码；2.oneflow实现LBFGS优化算法，对标torch.optim中的lbfgs优化算法。训练任务是相同数据训练一个10层左右的Linear网络。目前出现问题是，同样的checkpoint，cpu版本的oneflow目前已调试到精度为99%，cuda版本上最低为50%左右（波动较大）。在cuda上计算时小数点四位之后和cpu上不对齐，在LBFGS优化算法中会经由不同的if-else分支放大，多轮训练后会滚雪球累加。用了官方上手的MNIST的代码也做了同样的测试，同样的checkpoint，5轮测试后，精度差大概在小数点后5位左右，在第一轮grad后即开始出现误差。官方代码为Linear网络，改成卷积网络后也有类似问题，但linear上误差更明显。此外，torch上也有类似问题，在MNIST上同样的算子在cuda中和cpu上仅有些微精度损失。在模拟计算任务中，cpu上accuracy为99%，cuda上为70%左右。不同torch和cuda版本会有不同程度上的精度以及accuracy损失。 ## Code to reproduce bug Please post a minimal example to repro the bug. GitHub Gist or repo is highly recommended. ##...

GG-yuki

bug

community

NeuS网络训练速度慢

10

## Summary 三维重建NeuS网络实现遇到总体训练耗时太长的问题，用time.time()计算了下每次迭代的正向和反向的时间，与pytorch对比，结果正向平均单步耗时（）远大于pytorch（），反向差别不大。用oneflow只跑正向发现每隔10步左右会有一轮特别耗时，大概1s左右，除此之外的每步耗时和pytorch也差不多。这是oneflow框架跑网络，用time.time()在python脚本里统计的每个迭代的时间，会发现大部分迭代时间都正常，在30ms左右，偶尔出现耗时达到1s左右的。 ![image](https://user-images.githubusercontent.com/96570234/184785723-61e74904-f534-4c5d-aef8-f692ed8b26ca.png)

yoonlee888

bug

community

oneflow tensor有暴露底层存储地址的API吗？

我看pytorch有一个tensor.data_ptr的接口可以直接看到底层的存储，oneflow有类似的支持吗？

jeremyyx

Support broadcast ops

5

- [x] broadcast_shapes - [x] broadcast_tensors - [x] broadcast_to - [x] Tensor.broadcast_to - [x] 补充文档

mosout

enhancement

op

Speed up the training

Mentioned in https://github.com/Oneflow-Inc/OneTeam/issues/1735, some operators might need to be run as late as possible since they have a large activation time in cpu. In this feature, we move those operators...

Yipeng1994

fix batchnorm infer dtype failed in half inference

4

#close https://github.com/Oneflow-Inc/oneflow/issues/9381

BBuf

enhancement

automerge

bug

eager

api

oneflow
oneflow copied to clipboard

Metadata

flow.cat 对其进行切片时内存泄漏

安装cu112的包，先import torch，后import oneflow，再使用cuda，会报错

【NOT MERGE】Code example for user op implementation

Can oneflow eager support amp eager mode ?

cuda和cpu上精度误差较大，tan_grad未注册

NeuS网络训练速度慢

oneflow tensor有暴露底层存储地址的API吗？

Support broadcast ops

Speed up the training

fix batchnorm infer dtype failed in half inference

← Metadata

Owner

Metadata

oneflow oneflow copied to clipboard

Metadata

← Metadata

Owner

Metadata

oneflow
oneflow copied to clipboard