data
data copied to clipboard
Changing decoding method in StreamReader
🐛 Describe the bug
Hi,
When decoding from a file stream in StreamReader
, torchdata automatically assumes the incoming bytes are UTF-8. However, in the case of alternate encoding's this will error (in my case UnicodeDecodeError: 'utf-8' codec can't decode byte 0xec in position 3: invalid continuation byte
). How do we change the decoding method to fit the particular data stream?
Versions
Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.23.0
[pip3] pytorch-lightning==1.6.4
[pip3] torch==1.11.0
[pip3] torchdata==0.3.0
[pip3] torchmetrics==0.9.1
[pip3] torchvision==0.12.0
[conda] numpy 1.23.0 pypi_0 pypi
[conda] pytorch-lightning 1.6.4 pypi_0 pypi
[conda] torch 1.11.0 pypi_0 pypi
[conda] torchdata 0.3.0 pypi_0 pypi
[conda] torchmetrics 0.9.1 pypi_0 pypi
[conda] torchvision 0.12.0 pypi_0 pypi
To be more specific, is there no way to read from StreamReader
as bytes?
It depends on how you open
your file, rather than StreamReader
. If you use FileOpener
(functional API as open_files
), you can specify the encoding to b
to open file in bytes.