zfs icon indicating copy to clipboard operation
zfs copied to clipboard

Allow maximum file name length to be increased

Open Haravikk opened this issue 3 years ago • 28 comments

Describe the feature would like to see added to OpenZFS

Unless I've badly misunderstood, it would appear that ZFS currently has a hardcoded limit on file name length of 256 bytes (255 + NUL?), anything longer is rejected.

While this isn't unusual for a file system, it does mean that there are file systems that handle names that ZFS cannot, which makes transitioning to ZFS more complicated.

For example, ExFAT on any platform, as well as HFS+ on macOS both support file names up to 255 UTF-16 characters, which means that file-names transferred to ZFS could be limited to as little as 127 or fewer characters depending upon the number of multi-byte characters.

I would like to see the ability to increase the maximum file name length, either as standard (if this has no legacy issues), or through the use of a property to set an increased limit.

How will this feature improve OpenZFS?

It will remove one of the few limits of ZFS that is relatively low compared to other file systems, especially since it already allows all unicode (except NUL) and has unlimited (at a filesystem level) path names. Having the filename comparatively limited seems overly restrictive for an otherwise very permissive filesystem.

Crucially this will improve compatibility when migrating from other filesystems to ZFS, as it will eliminate the need to either truncate such long file names or use ZVOLs instead to retain them (i.e.- continuing to use the other file system anyway).

Additional context

I'm unclear on what the actual technical limitation is; I've seen some threads here and there discussing possible reasons for why the limit was set where it is in the first place (compatibility with other filesystems or kernel limitations at the time) but as far as I can tell there is no real reason for it to be limited as it is today, though I recognise it could be hardcoded in more places than just file naming (I've tried to look but didn't find any).

This would make the main questions; how much to raise the limit by (or whether to have one at all) and whether it should be applied retroactively as a default?

If there are no technical limitations, my preferred solution would be to move the filename limit into a ZFS property; for the sake of consistency the default for all new and existing datasets could still be the current limit of 255/256 bytes. However it should be configurable to both smaller and larger values, probably up to some unsigned integer limit large enough to be effectively unlimited (with the caveat that very large names could have an impact on performance).

Ideally it should be possible to change this limit at any time (not just at dataset creation), but this will likely depend upon where the current limit is checked. If it's only checked when a file is set then this will make implementation a lot easier, but if there are any structures that rely on the current size, for example to store directory contents, then this may not be possible. Hopefully someone more familiar with the code than myself can comment.

Setting a size larger than the default may require a feature flag, but this will depend on whether other ZFS versions will actually fail when they encounter a longer than normal name. It's possible older versions of ZFS may just require the filename to be shortened when renaming the file, which arguably isn't significant enough to justify a feature flag. A feature flag may however be warranted if the difference in filename limits would cause moving a file to fail, as this would breach the general rule that once a file has been named it should be possible to move it freely (to any valid location), though this may still be minor enough to not warrant it (it's just a caveat of setting the larger limit in the first place, i.e- check your names before downgrading).

Apologies if there have been any similar issues, the only one I found was #2344 which was referring specifically to the fact that the filename length limit was (at the time) undocumented.

Haravikk avatar Jan 30 '22 12:01 Haravikk

I think you could add a dataset property, which can be enabled at create time, to have longer names. A bunch of code would need to be done though, as there are a number of places that uses MAX_PATH. So it is non-trivial in that sense. But once done, it would handle compatibility between pools, between platforms. So it has a fair chance to be accepted by everyone, although each platform will have to check if they even can handle longer/more characters. Some possibly can't.

Privately over in macOS world, we have wondered if we could do something that would work for us, without breaking things for other platforms. I think the least-hacky, but "might work", would be to overflow long names into a xattr (fixed internal name), and some indicator to make detecting it easier (z_pflags maybe). Pools would still work on other platforms (except for any long-name collision) similar to DOS 8.3 thing. Alternatively, long names can just be unique hashes for other platforms, and the entirely name in xattr for us.

XNU can handle 256 characters, so the OS can certainly go a little longer.

lundman avatar Feb 01 '22 10:02 lundman

I've just increased MAXNAMELEN etc from e.g. 255 to 1023. It works for me, but don't do it if you have any sensible data.

caadar avatar Feb 01 '22 19:02 caadar

Don't get me wrong, but how is such an ancient relic like a 255 byte limit still a thing? I understand that every filesystem has its limits but I see no reason not to increase this one - even by default(!). The linked bug report #2344 has a typical example: Filenames in a foreign language, where the limits are even shorter because a character takes up more than a byte. Another use case that comes to mind: Nowadays, people tend to put all sorts of smiley faces, emojis and other weird symbols in filenames. Pointless as it may be, you can't tell anyone "I can't copy your file because I use ZFS". When downloading media files like background music using a common downloader, it often fails because the filename is too long. This has to worked around by changing into /tmp or mounting another tmpfs to download it there, manually clean the filename and move it, what a hassle. All of this for downloading a bit of concentration music with Unicode squares in the title - because a 255 byte limit is exceeded.

Comments appreciated.

c0xc avatar Feb 04 '22 15:02 c0xc

I understand that every filesystem has its limits

Yes, but see Raiser4 and Bcachefs (https://github.com/koverstreet/bcachefs/issues/3).

caadar avatar Feb 04 '22 17:02 caadar

After some testing:

ASCII filenames can currently be 255 characters long. An emoji like 😭 can be repeated for 63 times in a filename Even for natural language, a character like 囻 can only be repeated for 85 times.

which is a significant limit.

asdfugil avatar Feb 05 '22 11:02 asdfugil

Not to mention if you are a fan of classical music you can't store your media properly on ZFS because with song titles like Trio élégiaque No. 2 in D Minor, Op. 9: II. Quasi variazione: Andante - Allegro - Lento - Allegro scherzando - Moderato - L'istesso tempo - Allegro vivace - Andante - Moderato or Stravinsky: Le sacre du printemps, Pt. 1: L'adoration de la Terre - 1. Introduction, Lento - Più mosso - 2. Les augures printaniers - Danses des adolescentes, Tempo giusto - 3. Jeu du rapt, Presto you have no choice but to truncate them, as once you include the artist and album in the filename you are way over 256 characters, sometimes even over 512 characters.

It bugs me having so many truncated filenames on ZFS, and when you're grepping for a particular song the bit you're searching for is invariably part of the truncated section so you can't find what you're looking for.

I was assuming this was an OS limit, but if it's true that it's a ZFS-specific limit then it would be great to get it extended. Looking at the Wikipedia filesystem comparison page, btrfs, ext4 and XFS all have the same 255 byte filename limit as ZFS. It seems only ReiserFS supports longer filenames (~4000 bytes) but sadly it looks like it will be removed from the Linux kernel soon.

So having this feature request implemented could soon place ZFS as the only way to get filenames longer than 255 characters on a filesystem under Linux.

Malvineous avatar Mar 20 '22 10:03 Malvineous

This should start to become more of a priority as OpenZFS is starting to look promising not just on Macs but WINDOWS too: https://github.com/openzfsonwindows/ZFSin

Windows 10 extended long file name support from 260 characters up to a max of 32,767 characters!

I sometimes end up with some very long file names, either saved websites, or programs that save portion of long header text from file as its filename when doing an export. The more widespread/cross-platform ZFS becomes the more important this is.

On a related note, are there any characters allowed in a file name on HFS or APFS or NTFS/ReFS that aren't allowed on zfs? If so, what happens when copying such files with illegal characters?

jittygitty avatar Apr 28 '22 06:04 jittygitty

I haven't encountered any issues with specific characters; APFS and HFS are supposed to support all character except NUL.

macOS does allow colons in file names which might cause possible issues on Windows; it displays these as slashes in the Finder etc. as an alternative to handling actual slashes within file names, but command line tools see them as colons.

I'm not sure what would happen if you could create a file with an actual forward slash in the name, as I'm not aware of any tools that will let you create one, if anyone knows of a way I could give it a try? I expect most likely some programs just won't resolve the file path properly; the standard macOS APIs for handling file paths should cope but might silently swap slashes in file/directory names for colons when saved.

Haravikk avatar Apr 28 '22 08:04 Haravikk

@Haravikk I've seen some pretty weird filenames saved when I save some web pages and the title was some crazy long thing with all kinds of characters. My concern was if copying them over to zfs or even other linux filesystem would work fine. And yea you're right seems windows didn't let me put a "/" in the filename. Says these are not allowed: \ / : * ? " < > | What's not allowed on Linux? (and is it file-system specific as well?) Hmm: Linux(ext[2-4]): Any byte except NUL or / https://en.wikipedia.org/wiki/Filename#Comparison_of_filename_limitations

jittygitty avatar May 05 '22 10:05 jittygitty

When accessing a Linux filesystem from Windows over the network with Samba, it maps characters legal on Linux but illegal under Windows to alternate characters, so you could always borrow their mapping. That would correctly map characters back and forth between SMB shares and ZFS volumes regardless of OS which would be nice (e.g. you could copy a file from a Linux filesystem that has an illegal character, over SMB, onto a ZFS volume under Windows, then mount that ZFS volume on a Linux machine and the original character would still be there despite it having traversed a Windows machine where that character is not permitted).

However it would only be a problem if you're moving ZFS volumes between platforms. If a volume stays on the same platform then the OS would dictate which characters can and can't be written to it, so as long as ZFS says only NUL and / are disallowed (if it even needs to enforce that) then it would be fine in all other cases.

Malvineous avatar May 05 '22 11:05 Malvineous

What's not allowed on Linux? (and is it file-system specific as well?)

Ultimately the restrictions are file-system specific, and in theory it shouldn't matter if you transfer a file system with files that have characters that are disallowed.

The problem is that many tools may not handle unusual characters properly. Even on Linux a forward slash in a filename can't be easily set, and even if you do then while some tools may still handle it correctly, others won't as most just expect a path to be one big string and can't distinguish between a slash in a file name, or a slash as a path separator. When dealing with such a file you may not be able to overwrite it or delete it, which would be problematic.

In the same vein however, it's very difficult to create such files in the first place, so you'd need to be doing so intentionally for some reason. I'm not actually familiar with any programs that can create a file with a slash in the name off-hand (excluding the macOS hack that isn't actually storing as slashes anyway).

That said, if you really want a forward slash in a file name there are now several unicode characters to get a slash that aren't the same type of slash used as a path separator, but that's not very user friendly option (not even sure if you can type any of them directly). If you do though then any program that can handle unicode file names properly should be able to handle it, though it'll still be difficult for the user to type (e.g- if you wanted to enter the path manually in a command line).

A feature that can strip out and/or replace problematic characters wouldn't be a bad idea, especially for datasets intended to be portable between OSes, though I'm not sure it's critical personally; other file-systems have compatibility issues when moved between operating systems, so while that doesn't mean ZFS shouldn't try to cope with it better, it doesn't necessarily need to either.

As far as I'm aware though forward slash is the only character that will cause trouble on most operating systems, and colons can on Windows (it shouldn't really as it should only appear at the front, but a lot of programs don't handle it properly). Beyond that modern OSes should all handle unicode reasonably well, and to my knowledge no file-system allows NUL in a file name.

Haravikk avatar May 05 '22 13:05 Haravikk

btrfs, ext4 and XFS all have the same 255 byte filename limit as ZFS

In the case of Btrfs it does seem like it is possible to support over 255 bytes, but it is hardcoded at 255: "we can actually store much bigger names, but lets not confuse the rest of linux". How bigger I don't know, but changing it to 1023 does appear to work. Other file systems like ext4, JFS, XFS actually have a 'hard' limit which would require on-disk format changes, while the btrfs limit appears to only be nominal.

https://github.com/torvalds/linux/blob/0c7030038e6106711c5d0b237c980905dd3244ec/include/uapi/linux/btrfs_tree.h#L22

layercak3 avatar Dec 16 '22 07:12 layercak3

I've just increased MAXNAMELEN etc from e.g. 255 to 1023. It works for me, but don't do it if you have any sensible data. How did you it? Is this a macro that you need to recompile the kernel or is this a envvar?

jeff-zheng-silc avatar Jul 03 '23 02:07 jeff-zheng-silc

How did you it?

I patched the sources and rebuilt the kernel.

Is this a macro that you need to recompile the kernel or is this a envvar?

A constant scattered throughout the ZFS code. (Too bad it's not in the same place.)

caadar avatar Jul 03 '23 10:07 caadar

I'm glad to see so many people concerned about the file name length limit in ZFS. As an Asian, most of the characters I use in my daily life require at least 2 bytes, so 255 bytes is actually not sufficient for me. I hope this issue can be resolved in the near future.

ghost avatar Jul 24 '23 04:07 ghost

Fully agree. We have Minio S3 over ZFS. And customers can't upload few files from Windows into our S3 because of this ancient limit. We need to use intermediate database with filenames and it's uuids.

AlexZIX avatar Dec 03 '23 16:12 AlexZIX

How about sponsoring a developer to implement this feature?

AllKind avatar Dec 03 '23 17:12 AllKind

Think it's possible. Can you please calculate the price? I'll discuss with my CEO.

AlexZIX avatar Dec 03 '23 17:12 AlexZIX

There's various ZFS vendors that would take it up, one that comes to mind is klara: https://klarasystems.com/zfs-development/zfs-custom-feature-development/

Evernow avatar Dec 03 '23 17:12 Evernow

If I could I would. I guess first thing would be to find one of the active developers willing to do it. And then ask him/her for the price. The only one I saw around here asking to pay him for a certain feature was @robn . (I'm sorry @robn if pointing this towards you is not appropriate)

AllKind avatar Dec 03 '23 17:12 AllKind

There's various ZFS vendors that would take it up, one that comes to mind is klara: https://klarasystems.com/zfs-development/zfs-custom-feature-development/

Thanks, sent request to this team.

AlexZIX avatar Dec 03 '23 18:12 AlexZIX

If I could I would. I guess first thing would be to find one of the active developers willing to do it. And then ask him/her for the price. The only one I saw around here asking to pay him for a certain feature was @robn . (I'm sorry @robn if pointing this towards you is not appropriate)

Thanks, sent the request to Rob too.

AlexZIX avatar Dec 03 '23 18:12 AlexZIX

Very nice @AlexZIX ! I hope, in case your company sponsors this feature, it'll become a public Pull Request and end up in the public code!

AllKind avatar Dec 03 '23 19:12 AllKind

This was brought up by Allan Jude from Klara during the December 2023 leadership meeting: https://youtu.be/OnuaWyt8QZo?si=9sHa_rD-1uSVVkdC&t=1097

  • Could be possible to do a per dataset feature flag so that at creation you can specify the maximum length.
  • Would have to be carefully done to make sure there are no overflows anywhere in the codebase.
  • Due to having to support current platforms (FreeBSD's max is 1024, Linux is 4096), even if it was changed today due to having to ensure supported platforms, going for the lowest common denominator is likely a result.
  • Potential backwards compatible option was brought up by Rob Norris, it's not looked into well yet from the way it was talked about, and frankly went over my head. He starts talking about it in 20:10

Evernow avatar Dec 08 '23 07:12 Evernow

Hey I feel seen, thanks guys.

  1. If you do a dataset property, you are essentially dropping compatibility right? If a user creates a dataset with 4096 limit linux, you can not mount/read this dataset on any other platform right? So are we just talking a slightly larger limit (say 1024), that works all (presumable all current) platforms?

  2. The xattr overflow, the rsync concern wouldn't happen, since rsync would either be presented by the full name if the system can handle it, or a unique hash of the name (yeah, think 8.3 mapping here, ick I know - but something similar where perhaps the last 250-256 bytes would be hashed). But it does have the advantage that you can mount/read the dataset anywhere, even if the names look weird (and rsync does copy the xattrs). So a Windows 32768 dataset might look like just hashes, would be "readable". But I do hear Alan's concern of reading xattrs in the dirlist, (although, this already happens with a xattr loving OS like macOS.)

I'm not advocating for either, just thoughts.

lundman avatar Dec 08 '23 08:12 lundman

I'm not capable of an implementation myself, but...

I dislike option 2, but from a full continuity of support, it makes the most sense to me. With option 1, you're going to break support (and accessibility) not just at the file system level but also with many applications with the larger limit - NFS, for instance.

With the xattr overflow, the data would conceivably still be accessible based on the OS/application support level. You'd not necessarily need to make it xattr, perhaps store it in another (optional, so as to not incur unnecessary performance issues if unset) extended metadata object. These are (hopefully) going to be limited corner cases and it would be nice not to have to suffer the consequences if undesired: you could read the same dataset with or without those extended file properties, depending on whether you set a property on the volume. If the added length isn't necessary, it isn't used.

Treat it more like recordsize, at least in so far as setting it and the impact of setting it on the dataset. If it is needed, such as serving up non-latin locales in UTF-16 for really long media filenames over SMB, the option could be set and it'd work without loss of resolution. But I could also revert to the 255 byte limit by disabling it, and longer file names would potentially revert to a "8.3 equivalent" truncation for compatibility purposes.

bhodgens avatar Mar 11 '24 20:03 bhodgens

Do we know yet, what the actual impact of importing a dataset with truly longer (no xattr) file-names on a system that doesn't support them would be?

Obviously if it would cause crashes or such like then it would presumably require a feature flag so pools won't be allowed to import on incompatible versions, but would it crash? I guess the issue really is whether the file-name length is just a limit (something that's checked when a file is created/moved) or if it's used for things like buffers anywhere.

Because if it's just a limit then I would expect older versions of ZFS to be oblivious to the longer names, except when you try to rename or move them in which case you might find you have to shorten the name to fit the lower limit. That's 100% a guess though!

While some kind of extended attribute based solution makes sense to preserve compatibility, unless an older ZFS version will actually properly choke on the longer names it seems unnecessary, as moving a newer setup over to an older version seems like a pretty niche case?

Haravikk avatar Mar 12 '24 09:03 Haravikk

Just an FYI for others, after our discussion at the leadership meeting drew attention to the issue, Nutanix has posted their version of this feature as a pull request: https://github.com/openzfs/zfs/pull/15921

allanjude avatar Aug 15 '24 14:08 allanjude

Closing, #15921 has been merged.

behlendorf avatar Oct 04 '24 20:10 behlendorf