luigi icon indicating copy to clipboard operation
luigi copied to clipboard

Byte writing with localTarget

Open ifiddes opened this issue 4 years ago • 8 comments

I raised this issue on stackoverflow recently:

https://stackoverflow.com/questions/59312056/using-luigi-localtarget-with-pdfpages-in-python3/60309832#60309832

Is it possible to allow for byte write mode for localTargets?

ifiddes avatar Feb 23 '20 15:02 ifiddes

I am no expert in the working of LocalTarget, so i do not know if there is a reason for the removal of the b-flag, or if this is a bug.

A way to work around this is to wrap the code using the temporary_path function:

import luigi

class BinaryFileExample(luigi.Task):
    def output(self):
        return luigi.LocalTarget("simple_binary_file.extension")

    def run(self):
        with self.output().temporary_path() as my_binary_file_path:
            with open(my_binary_file_path, 'wb') as inner_file:
                newFileBytes = [123, 3, 255, 0, 100]
                for byte in newFileBytes:
                    inner_file.write(byte.to_bytes(1, byteorder='big'))

hirolau avatar Feb 24 '20 08:02 hirolau

We're getting the band back together!

For those wondering, that don't want extra clicks, the problem is here:

https://github.com/spotify/luigi/blob/master/luigi/local_target.py#L159

def open(self, mode='r'):
    rwmode = mode.replace('b', '').replace('t', '')
    ...

Opening just zeroes out byte and text flags, which causes everything to be treated as text.

ihowell avatar Feb 24 '20 16:02 ihowell

Is there a reason for that line to be there? What would be the implications of just removing it?

ifiddes avatar Feb 24 '20 18:02 ifiddes

I do not have the time to run the tests on a linux setup, I can create a pull request in a week or two, need to investigate that everything works as it should when reading/writing from binary files.

But the code could look something like:

VALID_MODES = ('r', 'rb', 'w', 'wb', 'rt', 'wt')

class LocalTarget(FileSystemTarget):
   
    # .. class body...

    def open(self, mode='r'):
        
        if mode not in VALID_MODES:
            raise Exception("mode must be in %, (got: %s)" % (VALID_MODES, mode))

        if mode[0] == 'w':
            self.makedirs()
            return self.format.pipe_writer(atomic_file(self.path))

        if mode[0] == 'r':
            fileobj = FileWrapper(io.BufferedReader(io.FileIO(self.path, mode)))
            return self.format.pipe_reader(fileobj)

hirolau avatar Feb 25 '20 08:02 hirolau

I just opened #2931 which adds stuff to the docs about this. I believe it's related to how atomic writes/reads work.

guidopetri avatar Apr 14 '20 22:04 guidopetri

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If closed, you may revisit when your time allows and reopen! Thank you for your contributions.

stale[bot] avatar Dec 25 '20 12:12 stale[bot]

I just hit this. I was trying to open an output file in binary mode and got an error message saying

TypeError: write() argument must be str, not bytes

which was perplexing. I don't really see any reason why binary mode should be disallowed.

Edit: At the very least the b shouldn't be getting silently removed. If there is a good reason not to allow binary mode (I doubt it) then Luigi should at least notice this and emit an error or warning with some further explanation.

mivade avatar Jul 16 '21 14:07 mivade

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If closed, you may revisit when your time allows and reopen! Thank you for your contributions.

stale[bot] avatar Jan 09 '22 01:01 stale[bot]