Joon
Joon
In my opinion, authors define L_{vlb} = L_0 + ... + L_T, not L_t. Thus, they may calculate the vlb loss with scale factor T (self.num_timestep).
In my opinion, when you predict the x_start in t \approx T, with a cosine noise schedule, bar alphas (cumprod alphas) have very small values compared to linear noise schedules....
When I didn't use the DistributedSampler in the dataloader, this problem became less severe. But this problem remained.
I followed this post (https://ppwwyyxx.com/blog/2022/Demystify-RAM-Usage-in-Multiprocess-DataLoader/ ) In sam3/train/data/coco_json_loaders.py, we can add "TorchSerializedList" and modify load_coco_and_group_by_image func. ```python class TorchSerializedList: """ Alternative implementation using torch.Tensor for spawn/forkserver mode. torch.Tensor can be...
I guess the problem might be "_target_: sam3.train.transforms.segmentation.DecodeRle". According to your json example, the segmentation is not Rle format. You can convert them into Rle format.