libai
libai copied to clipboard
Support libai DETR project
TODO LIST:
- [x] coco_dataset预处理
- [x] modeling
- [x] trainer
- [x] torch权重加载测试(已对齐)
- [x] eager global tensor parallel evaluation结果对齐
- [x] 更libai的transformer实现,目前版本参考很多torch.nn.MultiHeadAttention
- [ ] 推进训练
oneflow bug和不支持算子记录
- oneflow min/max op 无法在不同数据类型间执行
- flow.cumsum ~~tensor.cumsum~~
- ~~nn.MultiHeadAttention~~
- ~~flow.cdist~~
- flow.as_tensor从numpy array转换时无法显式指定data type
- ~~flow.full_like~~
- for m in tensor: m[0]=False 并不会改变tensor数值
- tensor.copy_()不管用
- F.interpolate行为不一致
- tensor.split当split_size_or_sections=[x,0]的时候有bug
- ~~flow.ByteStorage~~
- tensor.unbind在global tensor中 *** NotImplementedError
from ~~flowvision~~torchvision.models._utils import IntermediateLayerGetter
不支持
from ~flowvision~torchvision.models._utils import IntermediateLayerGetter
不支持
这个地方flowvision不支持吗,那这边我去flowvision下更新一下然后打个tag包吧
from ~flowvision~torchvision.models._utils import IntermediateLayerGetter 不支持
这个地方flowvision不支持吗,那这边我去flowvision下更新一下然后打个tag包吧
嗯不支持的~
哈哈行~我刚准备绕一下先
from ~flowvision~torchvision.models._utils import IntermediateLayerGetter 不支持
这个地方flowvision不支持吗,那这边我去flowvision下更新一下然后打个tag包吧
嗯不支持的~
哈哈行~我刚准备绕一下先
好像在这里支持了 https://github.com/Oneflow-Inc/vision/blob/main/flowvision/models/layer_getter.py, 应该是文件名没有对齐23333
from ~flowvision~torchvision.models._utils import IntermediateLayerGetter 不支持
这个地方flowvision不支持吗,那这边我去flowvision下更新一下然后打个tag包吧
嗯不支持的~ 哈哈行~我刚准备绕一下先
好像在这里支持了 https://github.com/Oneflow-Inc/vision/blob/main/flowvision/models/layer_getter.py, 应该是文件名没有对齐23333
收到~
oneflow min/max op 无法在不同数据类型间执行
>>> flow.__version__
'0.8.0.dev20220411+cu102'
>>> torch.__version__
'1.11.0+cu102'
最小复现代码 以float64和float32为例,其他不同类型间同理
torch
>>> import torch
>>> x = torch.randn(5, dtype=torch.float32)
>>> y = torch.randn(5, dtype=torch.float64)
>>> torch.max(x,y)
tensor([ 1.1421, 1.2252, 0.3676, 1.0047, -0.0242], dtype=torch.float64)
>>> torch.min(x,y)
tensor([-0.4623, -0.1920, -0.8689, -0.4471, -0.2798], dtype=torch.float64)
oneflow
>>> x = flow.randn(5, dtype=flow.float32)
>>> y = flow.randn(5, dtype=flow.float64)
>>> flow.max(x,y)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
oneflow._oneflow_internal.exception.Exception:
File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/op_interpreter_util.cpp", line 139, in Dispatch<oneflow::one::Tensor>
Dispatch<TensorTuple>(op_expr, inputs, ctx)
File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/op_interpreter_util.cpp", line 131, in Dispatch<oneflow::one::TensorTuple>
Dispatch(op_expr, inputs, outputs.get(), ctx)
File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/op_interpreter.cpp", line 96, in Apply
internal_->Apply(op_expr, inputs, outputs, ctx)
File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/eager_mirrored_op_interpreter.cpp", line 139, in NaiveInterpret
user_op_expr.InferPhysicalShapeAndDType( attrs, device_tag ... TensorMeta* { return output_tensor_metas->at(i); })
File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_expr.cpp", line 445, in InferPhysicalShapeAndDType
dtype_infer_fn_(&infer_ctx)
File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/infer_util.cpp", line 54, in UnchangedDataType
Check failed: (tensor_desc.data_type()) == (first_tensor_desc->data_type()) (3 vs 2)
>>> flow.min(x,y)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
oneflow._oneflow_internal.exception.Exception:
File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/op_interpreter_util.cpp", line 139, in Dispatch<oneflow::one::Tensor>
Dispatch<TensorTuple>(op_expr, inputs, ctx)
File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/op_interpreter_util.cpp", line 131, in Dispatch<oneflow::one::TensorTuple>
Dispatch(op_expr, inputs, outputs.get(), ctx)
File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/op_interpreter.cpp", line 96, in Apply
internal_->Apply(op_expr, inputs, outputs, ctx)
File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/eager_mirrored_op_interpreter.cpp", line 139, in NaiveInterpret
user_op_expr.InferPhysicalShapeAndDType( attrs, device_tag ... TensorMeta* { return output_tensor_metas->at(i); })
File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_expr.cpp", line 445, in InferPhysicalShapeAndDType
dtype_infer_fn_(&infer_ctx)
File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/infer_util.cpp", line 54, in UnchangedDataType
Check failed: (tensor_desc.data_type()) == (first_tensor_desc->data_type()) (3 vs 2)
已同步至https://github.com/Oneflow-Inc/OneTeam/issues/1207
flow.cumsum支持,tensor.cumsum不支持
>>> flow.__version__
'0.8.0.dev20220411+cu102'
>>> torch.__version__
'1.11.0+cu102'
>>> x = flow.randn(10,10,10)
>>> y = flow.cumsum(x,1)
>>> y = x.cumsum(1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'oneflow._oneflow_internal.Tensor' object has no attribute 'cumsum'
>>> x = torch.randn(10,10,10)
>>> y = torch.cumsum(x,1)
>>> y = x.cumsum(1)
无法指定dtype
>>> x = flow.randn(5,5)
>>> flow.cumsum(x,dim=0)
tensor([[ 0.0508, 1.0346, -0.7175, -0.2991, 0.7678],
[ 0.4012, 2.2157, -1.1069, 0.7856, 2.3732],
[-0.6691, 1.7376, -0.2673, 0.8270, 2.3241],
[ 0.6488, 2.2601, -1.5217, 1.0009, 2.4177],
[ 1.0917, 1.9483, -1.0218, -0.4837, 3.5062]], dtype=oneflow.float32)
>>> flow.cumsum(x,dim=0,dtype=flow.float32)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
oneflow._oneflow_internal.exception.Exception:
File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/api/python/functional/py_function.cpp", line 40, in ReportKwargsError
TypeError: cumsum(): got multiple values for argument 'dim'
flow.as_tensor从numpy array转换时无法显式指定data type
已同步至:https://github.com/Oneflow-Inc/OneTeam/issues/1207#issuecomment-1073432125
>>> flow.__version__
'0.8.0.dev20220417+cu112'
>>> torch.__version__
'1.11.0+cu113'
最小复现代码:
flow
>>> x=np.random.randn(10)
>>> flow.as_tensor(x)
tensor([-0.3546, -0.6711, -1.3503, 0.7537, 0.4851, 0.4599, 1.4330, 0.2376, 0.3307, -0.1530], dtype=oneflow.float64)
>>> flow.as_tensor(x, dtype=flow.int64)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/chiziqiu/.conda/envs/libai/lib/python3.7/site-packages/oneflow/nn/modules/as_tensor.py", line 51, in as_tensor
raise TypeError("numpy-ndarray holds elements of unsupported datatype")
TypeError: numpy-ndarray holds elements of unsupported datatype
>>> flow.as_tensor(x, dtype=flow.float64)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/chiziqiu/.conda/envs/libai/lib/python3.7/site-packages/oneflow/nn/modules/as_tensor.py", line 51, in as_tensor
raise TypeError("numpy-ndarray holds elements of unsupported datatype")
TypeError: numpy-ndarray holds elements of unsupported datatype
torch
>>> torch.as_tensor(x)
tensor([-0.3546, -0.6711, -1.3503, 0.7537, 0.4851, 0.4599, 1.4330, 0.2376,
0.3307, -0.1530], dtype=torch.float64)
>>> torch.as_tensor(x, dtype=torch.int64)
tensor([ 0, 0, -1, 0, 0, 0, 1, 0, 0, 0])
for m in tensor: m[0]=False 并不会改变tensor数值
>>> flow.__version__
'0.8.0.dev20220417+cu112'
>>> torch.__version__
'1.11.0+cu113'
最小复现代码:
oneflow:
>>> mask = flow.ones(10,10)
>>> for m in mask:
... m[0]=False
...
>>> m
tensor([0., 1., 1., 1., 1., 1., 1., 1., 1., 1.], dtype=oneflow.float32)
>>> mask
tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]], dtype=oneflow.float32)
torch:
mask = torch.ones(10,10)
>>> for m in mask:
... m[0]=False
...
>>> m
tensor([0., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
>>> mask
tensor([[0., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[0., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[0., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[0., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[0., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[0., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[0., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[0., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[0., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[0., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])
已同步至https://github.com/Oneflow-Inc/OneTeam/issues/1207#issuecomment-1073432125
for m in tensor: m[0]=False 并不会改变tensor数值
>>> flow.__version__ '0.8.0.dev20220417+cu112' >>> torch.__version__ '1.11.0+cu113'
最小复现代码:
oneflow:
>>> mask = flow.ones(10,10) >>> for m in mask: ... m[0]=False ... >>> m tensor([0., 1., 1., 1., 1., 1., 1., 1., 1., 1.], dtype=oneflow.float32) >>> mask tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]], dtype=oneflow.float32)
torch:
mask = torch.ones(10,10) >>> for m in mask: ... m[0]=False ... >>> m tensor([0., 1., 1., 1., 1., 1., 1., 1., 1., 1.]) >>> mask tensor([[0., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [0., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [0., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [0., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [0., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [0., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [0., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [0., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [0., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [0., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])
这个问题应该是我们的tensor[i]返回的不是view而是一个新的tensor,应该露阳在推进这个问题?
@Flowingsun007
tensor.copy_()不管用
>>> flow.__version__
'0.8.0.dev20220417+cu112'
>>> torch.__version__
'1.11.0+cu113'
最小复现代码:
flow:
>>> x = flow.ones(5,5)
>>> y = flow.zeros(3,3)
>>> x[:3,:3].copy_(y)
>>> x
tensor([[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.]], dtype=oneflow.float32)
torch:
>>> x = torch.ones(5,5)
>>> y = torch.zeros(3,3)
>>> x[:3,:3].copy_(y)
tensor([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])
>>> x
tensor([[0., 0., 0., 1., 1.],
[0., 0., 0., 1., 1.],
[0., 0., 0., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.]])
oneflow.nn.functional.interpolate与torch行为不一致。
输出size不同。 对有些输入是ok的,但是有些不行,我目前还没找到规律,因为interpolate的实现我还不大懂。
>>> flow.__version__
'0.8.0.dev20220417+cu112'
>>> torch.__version__
'1.11.0+cu113'
最小复现代码:
flow
>>> x = flow.randn(1,2,1204,937)
>>> s = (38,30)
>>> F.interpolate(x,size=s).shape
oneflow.Size([1, 2, 38, 29])
torch:
>>> x = torch.randn(1, 2, 1204, 937)
>>> s = (38,30)
>>> F.interpolate(x, size=s).shape
torch.Size([1, 2, 38, 30])
已同步至:https://github.com/Oneflow-Inc/OneTeam/issues/1207
tensor.copy_()不管用
>>> flow.__version__ '0.8.0.dev20220417+cu112' >>> torch.__version__ '1.11.0+cu113'
最小复现代码:
flow:
>>> x = flow.ones(5,5) >>> y = flow.zeros(3,3) >>> x[:3,:3].copy_(y) >>> x tensor([[1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.]], dtype=oneflow.float32)
torch:
>>> x = torch.ones(5,5) >>> y = torch.zeros(3,3) >>> x[:3,:3].copy_(y) tensor([[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]]) >>> x tensor([[0., 0., 0., 1., 1.], [0., 0., 0., 1., 1.], [0., 0., 0., 1., 1.], [1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.]])
和第一个问题的原因应该是类似的。
oneflow.nn.functional.interpolate与torch行为不一致。
输出size不同。 对有些输入是ok的,但是有些不行,我目前还没找到规律,因为interpolate的实现我还不大懂。
>>> flow.__version__ '0.8.0.dev20220417+cu112' >>> torch.__version__ '1.11.0+cu113'
最小复现代码:
flow
>>> x = flow.randn(1,2,1204,937) >>> s = (38,30) >>> F.interpolate(x,size=s).shape oneflow.Size([1, 2, 38, 29])
torch:
>>> x = torch.randn(1, 2, 1204, 937) >>> s = (38,30) >>> F.interpolate(x, size=s).shape torch.Size([1, 2, 38, 30])
这个也记录一下吧,我去修一下。https://github.com/Oneflow-Inc/OneTeam/issues/1207#issuecomment-1073432125
oneflow.nn.functional.interpolate与torch行为不一致。 输出size不同。 对有些输入是ok的,但是有些不行,我目前还没找到规律,因为interpolate的实现我还不大懂。
>>> flow.__version__ '0.8.0.dev20220417+cu112' >>> torch.__version__ '1.11.0+cu113'
最小复现代码: flow
>>> x = flow.randn(1,2,1204,937) >>> s = (38,30) >>> F.interpolate(x,size=s).shape oneflow.Size([1, 2, 38, 29])
torch:
>>> x = torch.randn(1, 2, 1204, 937) >>> s = (38,30) >>> F.interpolate(x, size=s).shape torch.Size([1, 2, 38, 30])
这个也记录一下吧,我去修一下。Oneflow-Inc/OneTeam#1207 (comment)
已记录
flow.as_tensor从numpy array转换时无法显式指定data type
已同步至:Oneflow-Inc/OneTeam#1207 (comment)
>>> flow.__version__ '0.8.0.dev20220417+cu112' >>> torch.__version__ '1.11.0+cu113'
最小复现代码:
flow
>>> x=np.random.randn(10) >>> flow.as_tensor(x) tensor([-0.3546, -0.6711, -1.3503, 0.7537, 0.4851, 0.4599, 1.4330, 0.2376, 0.3307, -0.1530], dtype=oneflow.float64) >>> flow.as_tensor(x, dtype=flow.int64) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/chiziqiu/.conda/envs/libai/lib/python3.7/site-packages/oneflow/nn/modules/as_tensor.py", line 51, in as_tensor raise TypeError("numpy-ndarray holds elements of unsupported datatype") TypeError: numpy-ndarray holds elements of unsupported datatype >>> flow.as_tensor(x, dtype=flow.float64) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/chiziqiu/.conda/envs/libai/lib/python3.7/site-packages/oneflow/nn/modules/as_tensor.py", line 51, in as_tensor raise TypeError("numpy-ndarray holds elements of unsupported datatype") TypeError: numpy-ndarray holds elements of unsupported datatype
torch
>>> torch.as_tensor(x) tensor([-0.3546, -0.6711, -1.3503, 0.7537, 0.4851, 0.4599, 1.4330, 0.2376, 0.3307, -0.1530], dtype=torch.float64) >>> torch.as_tensor(x, dtype=torch.int64) tensor([ 0, 0, -1, 0, 0, 0, 1, 0, 0, 0])
fixed by this pr:https://github.com/Oneflow-Inc/oneflow/pull/8097
通过这个网络,还是发现了不少和pytorch不兼容的地方
通过这个网络,还是发现了不少和pytorch不兼容的地方
是的 不过基本都在着手解决了
tensor.split当split_size_or_sections=[x,0]
的时候有bug。
总结:当0出现在最后一个维度时有bug,在其他维度时没问题
版本
>>> torch.__version__
'1.11.0+cu113'
>>> flow.__version__
'0.8.0.dev20220511+cu112'
最小复现代码:
>>> x = torch.randn(2,100,7)
>>> x.split([7,0],-1)
(tensor([[[ 0.4736, -0.0404, -1.5499, ..., 1.0757, 0.4028, 0.9903],
[ 1.8894, -0.4257, 0.2570, ..., -0.4669, -1.8332, -0.9168],
[-0.2074, 0.6727, -0.9165, ..., -1.3757, 1.0796, -1.4637],
...,
[ 1.4639, -0.3440, 0.4957, ..., -0.4425, 0.9832, -0.1773],
[ 0.5572, -0.7418, 0.5709, ..., -0.8357, 0.5164, -1.5137],
[-0.1484, 0.5784, 0.3132, ..., -1.7116, -2.4209, -0.6352]],
[[-0.0512, 0.8071, -0.1806, ..., -0.6507, -1.7163, 1.2081],
[-2.1803, -0.2958, 1.4241, ..., 0.7722, -0.2404, -2.6822],
[ 0.5165, -0.9405, -0.0473, ..., -1.7761, -2.6822, -0.2629],
...,
[ 0.1908, -0.8162, 1.2067, ..., 0.0719, 0.8505, -1.1541],
[ 1.5042, 0.3226, 1.4068, ..., 0.2107, -0.4780, 0.6526],
[-0.4096, 0.9706, 0.6222, ..., -1.5738, 0.3576, -0.3889]]]), tensor([], size=(2, 100, 0)))
>>> x = flow.randn(2,100,7)
>>> x.split([7,0],-1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/dataset/chiziqiu/anaconda3/lib/python3.9/site-packages/oneflow/framework/tensor.py", line 709, in _split
return flow._C.split(self, split_size_or_sections, dim)
RuntimeError:
File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/functional/impl/array_functor.cpp", line 2376, in operator()
Narrow(x, axis, start_idx, length)
File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/functional/impl/array_functor.cpp", line 1251, in operator()
Check failed: (-dim_length <= start) && (start <= dim_length - 1) (Dimension out of range, expected to be in range of [-3, 2], but got:7)
但[0,7]时没问题:
>>> x = flow.randn(2,100,7)
>>> y=x.split([0,7],-1)
总结:当0出现在最后一个维度时有bug,在其他维度时没问题
Narrow(x, axis, start_idx, length)
可以用nightly试一下,这个问题应该没有了
tensor.unbind不支持global tensor
>>> flow.__version__
'0.8.0.dev20220511+cu112'
>>> x = flow.randn(100,4).to_global(sbp=flow.sbp.broadcast, placement=flow.placement("cuda", ranks=[0]))
>>> x.unbind(-1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/dataset/chiziqiu/anaconda3/lib/python3.9/site-packages/oneflow/framework/tensor.py", line 713, in _unbind
return flow._C.unbind(self, dim)
NotImplementedError
>>> x = x.to_local()
>>> x.unbind(-1)
(tensor([ 0.6732, -0.3023, 1.4604, 1.3799, -1.1469, -1.4389, 1.6037, 0.9446, -0.5044, 0.3336, 0.6152, 0.3299, 1.2392, 0.5234, 0.0195, -1.8286, -1.0080, 1.9139, -0.7478, 0.1140, 0.6781, -0.0913, -1.3242, -0.3646, -0.2825, -1.4854, -0.9145, 1.0963, 0.7683, -1.1118, 0.4805, -0.1116, -0.4031,
-1.1590, -1.8794, -0.2065, -0.3353, -0.2933, 0.0486, 0.7751, 1.0546, -0.9393, -1.3231, -0.9217, 1.1032, 0.3692, 0.2003, 0.5020, -0.1534, 0.0768, -0.4615, -1.1294, 0.5750, 0.5136, 0.9569, -0.4370, -1.2251, 0.8619, 0.7274, 1.2086, 1.2627, -1.4440, -0.0259, 1.0530, 0.3159, 1.6366,
-0.7254, 1.9394, 1.3874, -1.4916, 0.9147, 1.7787, 0.4056, -0.4525, 1.2214, -1.3924, 0.9251, -0.7092, 0.2885, -1.3293, -1.6186, 0.6288, 1.2619, 1.8240, -1.0310, 0.9776, -0.7870, -0.7614, 0.5374, -0.9490, -0.9730, 0.8723, 2.1137, 0.7878, 1.2088, 2.2680, 2.1485, -0.1677, 0.6356,
-0.8306], device='cuda:0', dtype=oneflow.float32), tensor([ 0.1140, 0.9664, -0.4128, -0.1795, -0.0924, -0.3642, 0.9534, -1.3775, -0.3605, 1.7204, 0.7058, 0.2884, -1.4371, 0.5512, 0.7113, 2.4053, -1.0669, 0.6128, -0.0496, -0.0412, 0.1771, -0.4215, -1.6474, -1.2480, 1.2520, 1.7337, -0.0214, 1.0931, -0.4014, -0.1295, 1.5144, 0.0573, -0.3434,
0.5316, 1.2685, 0.0731, -0.6885, -0.4406, 1.1376, 1.1505, 2.0006, 0.8170, -0.6909, -1.9807, -0.4704, 1.4258, -1.0187, -0.1252, -0.8503, -1.2286, -0.1469, 0.6578, -1.6982, 0.9313, -0.0991, 0.7988, 1.0618, 0.3656, -0.5173, 0.1062, -0.3526, 0.1705, 0.6896, 0.1062, 1.6790, 0.8675,
-0.9219, 1.0535, 2.5108, 0.4058, 1.5565, -0.1119, -0.5495, 0.8565, 1.7205, 0.7336, 1.0147, 0.7349, -0.4325, -0.2354, 0.3967, -1.1067, -0.0503, 0.5430, -0.5466, 1.5222, -0.1019, -0.8322, 0.1298, -0.5529, 0.0965, 0.3214, -0.0042, 1.3415, -1.0824, 0.8408, -1.6040, -0.1292, 0.4468,
1.2888], device='cuda:0', dtype=oneflow.float32), tensor([ 8.2494e-01, -2.4225e+00, 4.4831e-01, 1.2205e+00, -5.5395e-01, -1.9908e+00, -3.6989e-01, 3.1025e-01, -3.3615e-01, 1.2510e+00, -7.6869e-01, 2.2640e-01, 8.9426e-01, -1.0443e-02, 8.0189e-01, -7.1788e-01, -2.0481e-01, -7.3685e-01, -1.0895e+00, 8.1574e-01, -6.5085e-01, -1.5621e+00, 2.4240e+00,
9.5201e-01, 6.5284e-02, -2.7226e-01, 5.1478e-01, 9.1148e-01, -8.0543e-01, 3.2088e-01, -5.8496e-01, -7.3560e-01, -7.8625e-01, 3.5526e+00, -9.0569e-02, -6.4349e-01, -1.9497e+00, 9.3549e-02, 7.7196e-01, 1.2225e+00, 9.9349e-01, -1.4940e-01, -4.1041e-01, 3.3358e-01, 4.9947e-01, -8.1111e-01,
-5.5953e-01, -5.3114e-01, 1.2270e-01, 1.1031e+00, -4.3998e-01, -6.5134e-01, 5.9907e-01, -1.5741e+00, 1.1121e+00, 1.7249e+00, -8.3078e-01, 1.8889e+00, 3.7167e-01, 1.3959e+00, 4.0451e-01, -8.9412e-01, -7.2207e-01, -3.1799e-01, 2.9070e-01, -1.2411e+00, 5.2207e-01, 2.1749e+00, 1.6886e+00,
3.1185e-01, 7.1245e-04, -1.3975e+00, -5.5818e-02, 2.4448e+00, -7.0328e-01, 8.5506e-01, 4.1534e-01, -1.0993e+00, -1.2930e-01, 1.6159e+00, 5.8933e-01, 2.4079e-01, 2.4609e+00, 4.8775e-01, 3.1148e-01, 5.5383e-01, -4.3484e-01, 1.1865e+00, 5.4809e-01, 1.8185e+00, -1.0388e+00, -5.0242e-01,
-1.1045e+00, 9.4867e-01, -1.3901e+00, -7.4365e-01, 3.7658e-01, 7.6355e-01, -6.8516e-02, 1.1892e+00], device='cuda:0', dtype=oneflow.float32), tensor([ 1.1310, -0.9968, -0.8175, -1.0691, 1.1561, -0.6521, -0.3950, -1.1697, -0.3019, -0.7170, -1.5917, -0.6279, -0.7104, 0.6003, -0.4562, -0.7400, -0.5367, 0.9139, 0.0510, 0.6054, 0.6953, -1.1960, 1.8443, 0.0790, -1.7794, -0.2629, 0.0626, -0.2257, -0.2238, -1.8894, 1.3829, 2.2447, -0.3194,
0.1188, -1.1480, 0.1640, -2.7212, -0.0848, -0.1022, 2.7401, 0.3600, -0.6510, 1.4652, 1.4443, -1.0385, -0.8625, -0.3573, -1.0436, -0.5471, -0.3780, 0.2603, -2.6162, 0.0034, -2.3554, 0.9569, -1.1303, -2.0769, -1.4830, 0.2238, -0.3018, 0.6321, 1.0973, -0.7001, 0.0135, -1.1057, -0.1395,
-0.1630, -1.0537, 0.6513, 0.6935, 1.3550, -0.5250, -1.4301, -1.2223, 0.2209, 0.1352, -0.8554, -0.3600, 2.4356, -0.2436, 0.6964, -0.7971, -0.3240, 1.0740, -0.1335, -0.1686, -0.2754, 1.5222, 0.6987, 0.2988, -0.4435, -0.5215, -0.2787, -1.3216, -1.4181, -1.1776, 0.4957, 1.4997, -0.0745,
-0.0787], device='cuda:0', dtype=oneflow.float32))
def get_default_optimizer_params(
model,
base_lr=None,
weight_decay=None,
weight_decay_norm=None,
weight_decay_bias=None,
clip_grad_max_norm=None,
clip_grad_norm_type=None,
overrides=None,
):
"""
Get default param list for optimizer, with suport for a few types of overrides.
If no overrides are needed, it is equivalent to `model.parameters()`.
Arguments:
base_lr: lr for every group by default. Can be omitted to use the one in optimizer.
weight_decay: weight decay for every group by default. Can be omitted to use the one
in optimizer.
weight_decay_norm: override weight decay for params in normalization layers
weight_decay_bias: override weight decay for bias parameters
overrides: if not `None`, provides values for optimizer hyperparameters
(LR, weight decay) for module parameters with a given name; e.g.
``{"embedding": {"lr": 0.01, "weight_decay": 0.1}}`` will set the LR and
weight decay values for all module parameters named `embedding`.
For common transformer models, ``weight_decay_norm`` and ``weight_decay_bias``
are usually set to 0.
Example:
::
flow.optim.AdamW(
get_default_optimizer_params(model, weight_decay_norm=0, weight_decay_bias=0),
lr=0.01,
weight_decay=1e-4
)
"""
if overrides is None:
overrides = {}
defaults = {}
if base_lr is not None:
defaults["lr"] = base_lr
if weight_decay is not None:
defaults["weight_decay"] = weight_decay
if clip_grad_max_norm is not None and clip_grad_norm_type is not None:
defaults["clip_grad_max_norm"] = clip_grad_max_norm
defaults["clip_grad_norm_type"] = clip_grad_norm_type
bias_overrides = {}
if weight_decay_bias is not None:
bias_overrides["weight_decay"] = weight_decay_bias
if len(bias_overrides):
if "bias" in overrides:
raise ValueError("Conflicting overrides for 'bias'")
overrides["bias"] = bias_overrides
norm_module_types = (
LayerNorm,
flow.nn.BatchNorm1d,
flow.nn.BatchNorm2d,
flow.nn.BatchNorm3d,
flow.nn.GroupNorm,
flow.nn.InstanceNorm1d,
flow.nn.InstanceNorm2d,
flow.nn.InstanceNorm3d,
flow.nn.FusedBatchNorm1d,
flow.nn.FusedBatchNorm2d,
flow.nn.FusedBatchNorm3d,
)
params = []
memo = set()
for module in model.modules():
for model_param_name, value in module.named_parameters(recurse=False):
if not value.requires_grad:
continue
# Avoid duplicating parameters
if value in memo:
continue
memo.add(value)
hyperparams = copy.copy(defaults)
if isinstance(module, norm_module_types) and weight_decay_norm is not None:
hyperparams["weight_decay"] = weight_decay_norm
hyperparams.update(overrides.get(model_param_name, {}))
params.append({"params": [value], **hyperparams})
return reduce_param_groups(params)
libai关于optimizer的这段代码中module.named_parameters(recurse=False)
为什么recurse=False?True的话model_param_name显示的信息更全一些,比如可以用"transformer" in model_param_name
的方式匹配参数。
请教下设置为False是有什么特殊考虑吗? @CPFLAME
for module in model.modules() 已经是递归遍历了,所以这个 module 里面的参数不再递归,你想获得完整的名字可以靠 module 去获得 @HiHippie
for module in model.modules() 已经是递归遍历了,所以这个 module 里面的参数不再递归,你想获得完整的名字可以靠 module 去获得 @HiHippie
好的~
子秋注意跟踪一下,你在detr中反馈出来的问题是不是被修复了
子秋注意跟踪一下,你在detr中反馈出来的问题是不是被修复了
好的 袁老师