gallery-dl icon indicating copy to clipboard operation
gallery-dl copied to clipboard

File name length unchecked

Open adlerosn opened this issue 5 years ago β€’ 5 comments

The downloaded file name should never be longer than 255 bytes under Linux.

Take the terminal output below as example:

$ gallery-dl https://www.reddit.com/r/anthro/comments/f646i7/𝙾𝙿𝙴𝙽𝙸𝙽𝙢_π™°π™΄πš‚πšƒπ™·π™΄πšƒπ™·π™Έπ™²_π™·π™΄π™°π™³πš‚π™·π™Ύπšƒ_π™²π™Ύπ™Όπ™Όπ™Έπš‚πš‚π™Έπ™Ύπ™½πš‚_get_a_v_a/
./gallery-dl/reddit/anthro/f646i7 𝙾𝙿𝙴𝙽𝙸𝙽𝙢 π™°π™΄πš‚πšƒπ™·π™΄πšƒβ€¦ [email protected]! ONLY 10 SLOTS AVAILABLE !.jpg
[download][warning] OSError: [Errno 36] File name too long: "./gallery-dl/reddit/anthro/f646i7 𝙾𝙿𝙴𝙽𝙸𝙽𝙢 π™°π™΄πš‚πšƒπ™·π™΄πšƒπ™·π™Έπ™² π™·π™΄π™°π™³πš‚π™·π™Ύπšƒ π™²π™Ύπ™Όπ™Όπ™Έπš‚πš‚π™Έπ™Ύπ™½πš‚! ⚑️ get a V A P O R W A V E version of your character for only $25! if you'd like to grab a spot, PM me here, on telegram at @dhazeartt or email me at [email protected]! ONLY 10 SLOTS AVAILABLE !.jpg.part"
[download][error] Failed to download f646i7 𝙾𝙿𝙴𝙽𝙸𝙽𝙢 π™°π™΄πš‚πšƒπ™·π™΄πšƒπ™·π™Έπ™² π™·π™΄π™°π™³πš‚π™·π™Ύπšƒ π™²π™Ύπ™Όπ™Όπ™Έπš‚πš‚π™Έπ™Ύπ™½πš‚! ⚑️ get a V A P O R W A V E version of your character for only $25! if you'd like to grab a spot, PM me here, on telegram at @dhazeartt or email me at [email protected]! ONLY 10 SLOTS AVAILABLE !.jpg

Its file name is 247 characters long, which seems acceptable, but is 359 bytes long, which is deemed too long by the kernel (on BTRFS).

A quick workaround fix I did was through a PostProcessor:

class FixFileNamePostProcessor(gallery_dl.postprocessor.common.PostProcessor):
    def prepare(self, pathfmt: gallery_dl.util.PathFormat):
        """Updates file path"""
        pathfmt.clean_path = FixFileNameFormatterWrapper(pathfmt.clean_path)
        pathfmt.build_path()

That uses this wrapper:

class FixFileNameFormatterWrapper:
    """Wraps file name formatter for ensuring a valid file name length"""

    def __init__(self, formatter: gallery_dl.util.Formatter):
        self.formatter = formatter

    def __call__(self, *args, **kwargs) -> str:
        path = self.formatter(*args, **kwargs)
        parts = list(map(fix_filename_length, Path(path).parts))
        return str(Path(*parts))

That uses this function:

def fix_filename_length(filename: str) -> str:
    """Ensures a segment has a valid file name length"""
    if len(filename.encode()) > 240:
        extension = Path(filename).suffix
        extension_bytes_length = len(extension.encode())
        stem_bytes = Path(filename).stem.encode()
        fixed_stem_bytes = stem_bytes[:240-extension_bytes_length]
        fixed_stem = fixed_stem_bytes.decode(errors="ignore")
        return fixed_stem + extension
    return filename

It would be nice if MAX_PATH was also observed (4096 on Linux, 260 on Windows (up to Windows 10's 2016 update, but only if you changed an entry in registry)), but that's not an issue for me right now.

adlerosn avatar Jun 06 '20 03:06 adlerosn

I have the same issue. Would be great to see this implemented.

Randalix avatar Feb 12 '24 10:02 Randalix

I could make it work by adding this in the config (trimming the name):

            "filename" : {
            ""                  : "{title[:40]}_{subreddit}_{id}.{extension}"
            },

Randalix avatar Feb 12 '24 10:02 Randalix

This should be a feature. The following works for youtube-dl for example (though couldn't find it in docs):

  • %(title).150s: Truncates to 150 symbols.
  • %(title).150B: Truncates to 150 bytes.

{filename[:150]} works in gallery-dl so I can't imagine why something like {filename[:150B]} should not work.

There is this explanation in https://github.com/mikf/gallery-dl/issues/873#issuecomment-656366953

There is no good general solution for the "filename length problem", which is why I haven't really tried to implement something.

But we have the [:150] symbol limiter regardless, which is not a general solution. Looks like Linux is far from getting support for longer filenames. So for now software itself should take care of it.

reyaz006 avatar Aug 11 '24 23:08 reyaz006

Example Result
Slicing (Bytes) {title_ja[b3:18]} ロー・ワー

Hrxn avatar Aug 12 '24 00:08 Hrxn

Oh, thanks. Also found it here https://github.com/mikf/gallery-dl/discussions/4087#discussioncomment-5977221

It's probably safe to close this issue.

reyaz006 avatar Aug 12 '24 04:08 reyaz006