files
files copied to clipboard
Wishlist idea: use extended file attributes for tags
Prerequisites
- [x] I have searched open and closed issues for duplicates.
Extended file attributes for colour tags
This is a more of a wacky proposal than anything else (and yes, it's a wishlist sort of idea), so I'll keep this somewhat brief (and I sadly lost a previous version of this issue draft...) and I'll list the advantages and disadvantages of this approach, but I have a feeling the UX team may like this idea. Maybe also the people working on Files' codebase.
(Also, if you wonder why I refer to tags in general sometimes in this proposal it's because I have a "named/text" tags idea in mind for Files, I might create a separate issue for this at some point)
Current approach
So basically, the current approach that Files uses to know what files have what colour tags is with a SQL database. Currently, moving a file somewhere else, even within Files, does not preserve its colour tags. I don't think Files even attempts to track the file currently. This could possibly be fixed/implemented for moving or renaming files within Files (updating the SQL file when you move a file within Files). However, this would not work if you did this outside of Files:
$ mv ~/myfile ~/somewhereelse/myfile
When you then open Files again it wouldn't know that ~/somewhereelse/myfile is just ~/myfile moved somewhere else.
Guessing would be highly error-prone and impractical, as we could've easily moved myfile to ~/somewhereelse/myfile and then made a new different file called myfile where the other one was previously. Even worse, what if we renamed myfile externally? Files couldn't possibly know with absolute certainty that it's still myfile with a different name, because we could've made changes to it since then.
A heuristic or comparing algorithm would make Files slower due to all the potential file comparisons and directories we could check in a hypothetically large directory tree, and it'd be cumbersome to think up some sort of efficient algorithm for it and implement it.
What about keeping track of file changes with an external daemon? If you used inotify you'd probably run out of kernel watches quickly as you have to watch every single directory individually with inotify. Even if you just limited colour tag tracking to the home directory of the current user, used a blacklist for folders like .local, .config, etc. and extended the watch limit to a really large number, it could potentially take ages to set up all these watches, every single login, for every single non-blacklisted folder in the current user's home directory... etc, etc...
It seems pretty hopeless and utterly cumbersome to be able to practically track metadata with files in this manner.
Enter extended file attributes
But what if could instead tell the filesystem to keep track of our tag metadata for us instead of trying to manually track it? This is exactly what extended file attributes, or xattrs, are for! Let me demonstrate (this is a real example):
xattr example
$ mkdir somefolder
$ touch mypinkfiletaggedhello
$ attr -l mypinkfiletaggedhello
(no output)
$ attr
A filename to operate on is required
Usage: attr [-LRSq] -s attrname [-V attrvalue] pathname # set value
attr [-LRSq] -g attrname pathname # get value
attr [-LRSq] -r attrname pathname # remove attr
attr [-LRq] -l pathname # list attrs
-s reads a value from stdin and -g writes a value to stdout
$ attr -s tag.color.name -V "pink" mypinkfiletaggedhello
Attribute "tag.color.name" set to a 4 byte value for mypinkfiletaggedhello:
pink
$ attr -s tag.color.raw -V "#fe9ab8" mypinkfiletaggedhello
Attribute "tag.color.raw" set to a 7 byte value for mypinkfiletaggedhello:
#fe9ab8
$ attr -s tag.names -V '["hello", "some other tag", "files I like"]' mypinkfiletaggedhello
Attribute "tag.names" set to a 43 byte value for mypinkfiletaggedhello:
["hello", "some other tag", "files I like"]
$ attr -l mypinkfiletaggedhello
Attribute "tag.color.name" has a 4 byte value for mypinkfiletaggedhello
Attribute "tag.color.raw" has a 7 byte value for mypinkfiletaggedhello
Attribute "tag.names" has a 43 byte value for mypinkfiletaggedhello
$ exa -lh --group-directories-first
Permissions Size User Date Modified Name
drwxrwxr-x - me 1 Feb 2:30 somefolder
.rw-rw-r--@ 0 me 1 Feb 2:30 mypinkfiletaggedhello
$ mv mypinkfiletaggedhello somefolder/
$ exa -lh --group-directories-first somefolder/
Permissions Size User Date Modified Name
.rw-rw-r--@ 0 me 1 Feb 2:30 mypinkfiletaggedhello
$ attr -l somefolder/mypinkfiletaggedhello
Attribute "tag.color.name" has a 4 byte value for somefolder/mypinkfiletaggedhello
Attribute "tag.color.raw" has a 7 byte value for somefolder/mypinkfiletaggedhello
Attribute "tag.names" has a 43 byte value for somefolder/mypinkfiletaggedhello
(By the way, tags.color.name is an entirely arbitrary attribute name, it doesn't need any dots or anything, it can even look like this:)
$ attr -s "my pancakes" -V "abc" somefolder/mypinkfiletaggedhello
Attribute "my pancakes" set to a 3 byte value for somefolder/mypinkfiletaggedhello:
abc
So instead of trying to track a file's tags manually, we can simply use xattrs to store the file's tags instead.
With this, files would now automagically have their colour tags preserved if you moved them, renamed them, or copied them, even if you did it outside of Files or with some other file manager.
Pros & Cons
Advantages:
- Tags for files can now be stored within the filesystem: we don't have to manually track a file's tags anymore, the filesystem can do this better than we can
- We don't have to do anything special for moving and renaming files and their tags will be kept (see disadvantages regarding copying files)
- Simple to get a file's tags or any other stored metadata for it: just read the file's xattrs
- xattr names and values are UTF-8, so e.g. name/text tags would be compatible with non-English languages:
$ attr -s hello.あいうえお -V "こんにちは!元気ですか?" somefolder/mypinkfiletaggedhello Attribute "hello.あいうえお" set to a 36 byte value for somefolder/mypinkfiletaggedhello: こんにちは!元気ですか? - No more need for a SQL database
- Could potentially be standardised, meaning much more interoperability with other programs, file managers, desktop environments and OSes (in comparison, Files' SQL database is highly specific to Files)
Not really issues but should probably take note:
- Underlying filesystems would have to explicitly support extended attributes in order for this to work. However as it would seem many, many filesystems support it, not just common gnu+linux ones like ext4, but even Windows' NTFS and macOS's HFS+, even FAT(!?), this is probably not an issue
- Apparently you used to have to explicitly enable xattrs on gnu+linux systems with the option
user_xattron mounts infstab, but that doesn't seem to be the case anymore. My/etc/fstabfile doesn't haveuser_xattranywhere in the file and yet xattrs work fine here (yes, on elementary 5.1). So this doesn't seem to be an issue either?
Disadvantages:
- Certain archiving formats and syncing programs (e.g. syncthing) currently decide to not preserve xattrs or otherwise ignore them, so tags would be lost if you zipped or synced with those particular programs
- However they wouldn't be preserved with Files' current SQL database method anyway, so Files' current method doesn't have an advantage over xattrs here
- ~~I couldn't find any Vala bindings to GNU's xattr interface, so you'd probably have to write/generate bindings for it. See
/usr/include/x86_64-linux-gnu/sys/xattr.hfor the header file. It's not very long though and you could probably leave out the file descriptor ones, but if you're not particularly familiar with C (...I'm not) it will likely give you a big headache trying to do this~~ Vala has a way to manipulate xattrs, so this would not be required - You may have to explicitly set the option to preserve extended attributes when copying files, e.g. for
cpyou have to usecp --preserve=xattr file1 file2in order to preserve xattrs
I think, baring any other issues I'm unaware of, this could have a lot of potential.
This is an idea that has occurred to me before but I had not investigated it in the depth that you have - thank you! From you analysis, it seems there is no blocking disadvantage and it could be more efficient than the existing database. You can manipulate file metadata in Vala using the metadata:: or xattr:: namespaces for the file attributes. Files already uses the metadata namespace for storing sort order info.
It's good to hear that xattrs are supported in Vala. That probably makes this easier to implement overall.
For colour tags, I would store two separate values, the (common) colour name and it's raw hexadecimal SRGB value. The idea being for interoperability with other DEs and OSes.
So, e.g. elementary's Strawberry colour would be represented like this:
tag.color.name = "red"
tag.color.raw = "#ff8c82"
The colour tag name acts as a general indicator of what colour is supposed to be represented, so other DEs can use their own palette's colour to represent it if they wish. The raw colour is a fallback if the colour's name is not known, so the DE could still reproduce the tag's colour.
If strawberry was used for the tag's name instead, any other DE looking at the value would likely not know what colour strawberry is and would either have to add support specifically for elementary's palette, and every single other DE's unique palette names, or have to fall back to the raw value every time. So using a general colour name works better for interoperability.
Possibly there could be one or two more extra metadata values:
tag.color.raw.dark = "#7a0000"
The idea being that lighter values may not look so good in a DE's dark mode, so if it has to fallback to the raw value because it doesn't know what red is, it will not look as bad.
Basic algorithm
Here's a basic and very simple algorithm for colour determination. Assuming these xattrs in a file we're looking at:
tag.color.name = "red"
tag.color.raw = "#ff8c82"
tag.color.raw.dark = "#7a0000"
And assuming we have a some sort of colour database/hash/dictionary that would loosely look something like this (using a Ruby hash appearance here):
colors = {
"red" => {
"light" => "#ff8c82",
"dark" => "#7a0000"
}
"blue" => {
...
}
}
This is the algorithm I envision:
- Look at the
tag.color.namexattr first. If we know what colourredis, then use this value and ignore the other values. We can use our colour key/value data to determine the best raw colour to use, e.g. for dark mode. - If we don't have
redin the database, or the file doesn't have atag.color.namexattr, use the relevant raw colour, eithertag.color.rawortag.color.raw.dark. - If none of the
tag.colorxattrs are present, we assume the file doesn't have a colour tag.
It's more of a basic guideline and it may have problems. Also should we use tag.color.raw if we're in dark mode and tag.color.raw.dark is not present, and vice versa? Or should we just ignore the colour tag in this case?
Currently the color-tagging system just stores an index, which Files uses to lookup a color which is hard-coded in GOF.Preferences:
/* First element set to null in order that the text renderer background is not set */ public const string?[] TAGS_COLORS = { null, "#ff8c82", "#ffc27d", "#ffe16b", "#9bdb4d", "#64baff", "#cd9ef7", "#a3907c", "#95a3ab", null };
This is not ideal as the context menu uses basic color-names ("red", "orange", "yellow" etc) and css to get the colors of the ColorButtons. It may indeed be a good idea to store basic color names instead so the displayed color tags are guaranteed to match the ColorButtons. I am not sure we need to do anything else tbh, as there does not seem to be an existing standard way to implement color tags (indeed Nautilus and PCManFM do seem to implement them at all). Dark themes could probably be handled with css. Additional metadata could be added later if required.
The idea was to set out a basic but sane specification that could possibly become standardized in the future. Think of it as an unofficial proposal for a future standard. So if other file managers (or any other program) want to add colour tags later they would have the possibly of doing it in an interoperable way. Meanwhile elementary's Files would then have a way of handling colours we don't know about, say, turquoise, that some other DE set, where they have an equivalent colour in their palette and we (for this example) don't.
It may be a good idea for elementary to eventually have some sort of separate palette database (even if it's just a JSON in some directory like /usr/share/theme-palettes/ called elementary.json) so that the colours could possibly be updated easily and shared between elementary apps (and other apps as well, for example AppCenter-aimed apps).
Is there any evidence of a demand for such a standard? If not, we could be inventing a solution to a problem that does not exist. It may be better to just reproduce our current capabilities as simply as possible but in a way that allows additional sophistications later. As further extended attributes can always be defined for raw colors if required that should not be difficult.
I guess that's a reasonable thing to do then, just fill tag.color.name with a basic colour name (like red) in the xattrs of colour tagged files and if other DEs show interest in supporting colour tagging in their applications in the future then it can be extended (with e.g. raw colors) for better interoperability. Am I understanding you correctly?
A possible drawback of using extended attributes for color tagging is that, if there is only one attribute, then the same color will be applied for all users. The existing plugin maintains a separate database for each user. Using an attribute saves a lot of code however. If we want to maintain separate colors per user then it would be possible to combine the color with the user id in the attribute name, I guess.
I'd like to second using basic color names instead of hex values. Something that I think we're going to want to do in the stylesheet is make sure that "red" in the light stylesheet is darker than "red" in the dark stylesheet to maintain good contrast. This also allows us to change those colors in the future
Mentioning @cassidyjames for an opinion on making sure colors are per-user by (for example) pre/appending the uid.
My gut reaction is that color tags are probably personal and if you send someone a file, they probably don't want to inherit your color tag with it
I agree that color tags should not be inherited when a file is sent to another person but pre/appending the UID may not solve that as the other person may have the same UID on another computer. This is another drawback of attaching color tags to the file itself. It is very easy to remove color tags through the UI however and they do not cause any harm (?)
If the tags are just colors (and not i.e. user-editable text labels or more metadata), then I don't see a strong reason not to just attach it to the file itself and have it persist across all users. If it was per-user and the person getting the file wanted to categorize it, they'd need to re-tag it anyway. If they don't care, then it's not a big deal. It's not like it's leaking personal data at all—it's more like part of the file name itself.
I'd like to second using basic color names instead of hex values. Something that I think we're going to want to do in the stylesheet is make sure that "red" in the light stylesheet is darker than "red" in the dark stylesheet to maintain good contrast. This also allows us to change those colors in the future
The hex values thing was only intended in addition to a colour name as a fallback for interoperability between different desktop environments' file managers. The colour name would be the preferred option so the file manager could display whatever colour it wanted for it with any logic wanted, for example using a different colour shade depending on dark or light mode. If a different file manager didn't know what the colour name was, it could fall back to the hex value. For the case you described, that's why I suggested also having separate dark and light hex fallback values.
However interoperability is currently not a concern as elementary seems to be the only distro concerned with colour tags and it is not known if any other DE intends to add a similar feature. So there's no problem using the colour name only, and that keeps implementation complexity to a minimum as well.
FWIW Samba supports using extended attributes for various types of metadata created by Windows and macOS clients, so it would be great if Pantheon Files used an extended attribute metadata schema that is directly compatible with Samba’s implementation, such that the metadata could transfer somewhat seamlessly between platforms.
There’s some (rather incomplete) documentation of the various Samba virtual filesystem modules here:
In general it would be nice to have some sort of user-friendly Samba frontend (i.e. being able to access one’s home folder and being able to share individual directories from the context menu, like you can do with nautilus-share, all with sensible default settings that you wouldn’t need to fuss over), but this wouldn’t be a prerequisite for compatibility with the way Samba stores tags using extended attributes.
Well, I just did a test, applying an orange-colored tag with the title “Foo Bar” to a file in the Finder over Samba, then dumping the extended attributes, and it looks like an encoding nightmare:
$ getfattr --dump ./example.pdf
# file: example.pdf
user.DOSATTRIB=0sAAAEAAQAAAARAAAAAAAAAAAAAAAAAAAAAAAfZ4fO1gE=
user.DosStream.com.apple.metadata_kMDItemUserTags:$DATA=0sYnBsaXN0MDChAVhPcmFuZ2UKNwgKAAAAAAAAAQEAAAAAAAAAAgAAAAAAAAAAAAAAAAAAABMA
user.org.netatalk.Metadata=0sAAUWBwACAAAAAAAAAAAAAAAAAAAAAAAAAAgAAAAEAAAAmgAAAAAAAAAIAAABYgAAABAAAAAJAAAAegAAACAAAAAOAAABcgAAAASAREVWAAABdgAAAAiASU5PAAABfgAAAAiAU1lOAAABhgAAAAiAU1Z+AAABjgAAAAQAAAAAAAAAAAAOAAAAAAAAAAAAAAAAAAAnZCGAJ2QAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAJ2QhgCdkIYCAAAAAJ2QhgAAAAAAA/gAAAAAAAJ0KVAEAAAAAwux6YAAAAAAqpAYA
It’s possible that there are other Samba configurations that produce more interoperable extended attributes, though.
Between com.apple.metadata and _kMDItemUserTags there’s an invisible Unicode character, U+F022 or , which is just lovely.
It seems like this is due to using fruit:encoding = private in my smb.conf, so I can see what happens if I use fruit:encoding = native instead.