Ross Wightman comments

Results 497 comments of


                                            Ross Wightman

trafficstars

Feature Request : Segmentation model

@bluesky314 I would like to try this, but need to get obj detection training running first, a bit busy for a while so not sure when I'll get to it...

Feature Request : Segmentation model

@bluesky314 yeah, it should be fairly straight forward, but still making big improvements in the core model/post processing. One concern I have with the segmentation with the Tensowflow SAME equivalent...

Initial Mosaic augmentation implementation

@dmatos2012 thanks for the impl, unfortunately I can't merge. Yolov5 is GPL-3 license so I can't include any code from that project here as it would be in conflict with...

Export PyTorch to ONNX

This isn't a bug, it's just functionality not implemented since it's non-trivial. See #89 and #32 ... I'll leave this one open so another issue isn't created. I have no...

[BUG] OSError: [Errno 38] Function not implemented

@irinushirka colab isn't a normal filesystem, it's a FUSE filesystem on top of cloud storage and doesn't support hardlinks which the saver relies on for robust checkpoint saving (crash recovery)....

LayerNorm2d != GroupNorm w/ groups=1

To be more specific GroupNorm w/ groups=1 normalizes over C, H, W. LayerNorm as used in transformers normalizes over the channel dimension only. Since PyTorch LN doesn't natively support 2d...

LayerNorm2d != GroupNorm w/ groups=1

@sacmehta the equivalence for GN and LN as per the paper is for NCHW tensors when LN is performed over all of C, H, W (minus the affine part as...

LayerNorm2d != GroupNorm w/ groups=1

@sacmehta thanks for the update, looks like the channels-only LN is definitely not stable in this architecture.

multiple seeding / shuffle problems

FYI, item 2 essentially means that all training ends up as ResampledShards() as the distributed worker all get seeded differently (I confirmed this with a test)

Process hang with DDP training

@poor1017 you don't have enough shard files to distribute amongst the dataloader workers across all train processes. If a train process ends up with no shards, it will hangs training...