New API for datasets/PyTorch dataloaders
Several issues (#172, #224, #225) mention a new API for interacting with datasets, but I can't find any mention of this online. After adapting several examples to the current API, I have
train_set = kal.datasets.ModelNet(root=args.modelnet_root, categories=args.categories)
dataloader_train = DataLoader(train_set, batch_size=args.batchsize, shuffle=True, num_workers=1)
but iterating over the dataloader raises a type error caused by the ModelNet get_item not returning a Tensor. You can fix this with a custom collate function, like
dataloader = DataLoader(voxels, batch_size=10, shuffle=True, num_workers=1, collate_fn=lambda x : x)
but this also doesn't work with the examples. A few recent examples use torch.randperm to shuffle the dataset, but this doesn't support batch sampling. What is the current standard method for feeding data to DataLoaders?
I ended up using
transform = tfs.Compose([
tfs.TriangleMeshToVoxelGrid(resolution=30),
])
# Setup Dataloader
train_set = kal.datasets.ModelNet(root=args.modelnet_root, categories=args.categories, transform=transform)
dataloader_train = DataLoader(train_set, batch_size=args.batchsize, shuffle=True, num_workers=8)
which works fine after you change some attributes around. Is this the desired API at the moment? This also doesn't work because there are some corrupt entries in ModelNet. I was able to modify the ModelNet dataset to catch and randomly replace invalid entries, but this isn't available in the main library. Is this a feature we want to add, or is there a workaround?
def _get_data(self, index):
try:
data = TriangleMesh.from_off(self.filepaths[index])
except ValueError:
print(f"[WARNING] sample {self.filepaths[index]} does not work. Picking a random entry instead.")
idx = np.random.randint(0, len(self))
data = self._get_data(idx)
return data
Hi. Can you tell me the suitable import for your tfs as I'm not able to find TriangleMeshToVoxelGrid in any module.
It would be extremely helpful to see a basic demo / example of how to add or integrate one's own "custom" 3d point cloud data sets to use with Kaolin. Hopefully such a demo / example. could be made for the new Dataset API.