Added fn.decoders.numpy

Open 5had3z opened this issue 7 months ago • 0 comments

Category:

New feature (non-breaking change which adds functionality)

Description:

Got around to implementing a numpy file decoder op to address the problem first raised in #5337. I've been using the python impl fine, not sure why I was getting those segfaults initially. But its been at the back of my mind ever since to properly implement fn.decoders.numpy.

Currently there is only one argment to normalize the datatypes of different samples in a batch with dtype=DALIDataType. If this argument isn't given, this is infered from the first sample in the batch. If samples differ in the batch an error will be raised. A runtime error is also raised if the number of dimensions differ between samples in the batch.

Not only does this operation enable decoding numpy data saved in a webdataset, it enables users to load and decode numpy data outside of an external_source if they wish. Personally I've found performance benefits using fn.io.file.read rather than loading data in the external source. On the topic of fn.io.file.read, it isn't exported correctly in the type hint tiles, or at least vscode/pylance doesn't pick it up.

class ExternalSource:
  def __call__(self, sample_info):
    image: str
    label: str
    image, label = self.get_sample_paths(sample_info)
    image_npy = np.frombuffer(image.encode(), dtype=np.uint8)
    label_npy = np.frombuffer(image.encode(), dtype=np.uint8)
    return image_npy, label_npy

img_pth, lbl_pth = fn.external_source(ExternalSource())
img = fn.decoders.image(fn.io.file.read(img_path))
lbl = fn.decoders.numpy(fn.io.file.read(lbl_path))

Additional information:

Affected modules and functionalities:

New numpy.cc and numpy.h files added to dali/operators/decoder and test_numpy.py added to dali/test/python/decoder. Changed const std::string& to const std::string_view in ParseHeaderContents and removed some whitespace.

Key points relevant for the review:

I haven't looked into "GPU" implementation, I suppose the casting and transposing can be done on device for better throughput, but the simplest way to add a device='gpu' option, would be just a memcpy after doing all the work on the cpu, but at that point the user can just use fn.decoders.numpy(...).gpu().

Tests:

[ ] Existing tests apply
[x] New tests added
- [x] Python tests
- [ ] GTests
- [ ] Benchmark
- [ ] Other
[ ] N/A

Checklist

Documentation

[ ] Existing documentation applies
[x] Documentation updated
- [x] Docstring
- [ ] Doxygen
- [ ] RST
- [ ] Jupyter
- [ ] Other
[ ] N/A

DALI team only

Requirements

[ ] Implements new requirements
[ ] Affects existing requirements
[ ] N/A

REQ IDs: N/A

JIRA TASK: N/A

Jun 14 '25 12:06 5had3z