kaolin icon indicating copy to clipboard operation
kaolin copied to clipboard

New API for datasets/PyTorch dataloaders

Open jacobaustin123 opened this issue 5 years ago • 3 comments

Several issues (#172, #224, #225) mention a new API for interacting with datasets, but I can't find any mention of this online. After adapting several examples to the current API, I have

train_set = kal.datasets.ModelNet(root=args.modelnet_root, categories=args.categories)
dataloader_train = DataLoader(train_set, batch_size=args.batchsize, shuffle=True, num_workers=1)

but iterating over the dataloader raises a type error caused by the ModelNet get_item not returning a Tensor. You can fix this with a custom collate function, like

dataloader = DataLoader(voxels, batch_size=10, shuffle=True, num_workers=1, collate_fn=lambda x : x)

but this also doesn't work with the examples. A few recent examples use torch.randperm to shuffle the dataset, but this doesn't support batch sampling. What is the current standard method for feeding data to DataLoaders?

jacobaustin123 avatar Jun 02 '20 22:06 jacobaustin123

I ended up using

transform = tfs.Compose([
    tfs.TriangleMeshToVoxelGrid(resolution=30),
])

# Setup Dataloader
train_set = kal.datasets.ModelNet(root=args.modelnet_root, categories=args.categories, transform=transform)
dataloader_train = DataLoader(train_set, batch_size=args.batchsize, shuffle=True, num_workers=8)

which works fine after you change some attributes around. Is this the desired API at the moment? This also doesn't work because there are some corrupt entries in ModelNet. I was able to modify the ModelNet dataset to catch and randomly replace invalid entries, but this isn't available in the main library. Is this a feature we want to add, or is there a workaround?

    def _get_data(self, index):
        try:
            data = TriangleMesh.from_off(self.filepaths[index])
        except ValueError:
            print(f"[WARNING] sample {self.filepaths[index]} does not work. Picking a random entry instead.")
            idx = np.random.randint(0, len(self))
            data = self._get_data(idx)
    return data

jacobaustin123 avatar Jun 02 '20 23:06 jacobaustin123

Hi. Can you tell me the suitable import for your tfs as I'm not able to find TriangleMeshToVoxelGrid in any module.

adityavadalkar avatar Jun 25 '20 14:06 adityavadalkar

It would be extremely helpful to see a basic demo / example of how to add or integrate one's own "custom" 3d point cloud data sets to use with Kaolin. Hopefully such a demo / example. could be made for the new Dataset API.

MBrandt-NASA avatar Sep 04 '20 00:09 MBrandt-NASA