glusterfs
glusterfs copied to clipboard
Use singly linked list (or hash list as in the Linux kernel) for dentry, inode hash entries
dentry struct has two vars:
struct list_head inode_list; /* list of dentries of inode */
struct list_head hash; /* hash table pointers */
I think both can be converted to singly linked list, saving 16 bytes per dentry.
(inspired by https://lore.kernel.org/io-uring/1bc942b82422fb2624b8353bd93aca183a022846.1632516769.git.asml.silence@gmail.com/ , also see https://sbexr.rabexc.org/latest/sources/52/0436935ee0e606.html for implementation - maybe we wish to generalize and re-use in Gluster in other places?)
Adding to this, the following members of the structs are only ones that depend on functionalities of a doubly linked list, correct me if I am wrong.
_call_stack::myframes
gf_dirent_t::list
gf_log_handle_::lru_queue
drc_globals::cache_head
wb_request::winds
_gf_timer_registry::active
_call_stack::myframes
locally in function posix_fsyncer of posix.c
I think we can apply this optimization on all other lists except the ones listed above and benefit from the reduced space complexity. WDYT?
Regards
I'm not sure it's that easy. In essence, every list that a member can be taken out from via list_del_init() is not a natural candidate, as it'll require some method of traversing the list until finding that item and removing it, which is inefficient. Hash tables may be different, as the (hope is) that the traversal is already somewhat minimized by the hashing part. So once you are in the right list (due to the hash) you indeed traverse, but it's a 'short walk'.
If we take for example inode_list: you will need to traverse that list when you __dentry_unset() - I'm not sure how efficient it'll be. We need to consider this, vs. the potential memory saving.
https://github.com/gluster/glusterfs/pull/2869 was a good easy example.
I'm really unsure the above (my examples) were great candidates, but I think it's worth understanding memory consumption first in large scale environments.
Thanks for your input, will get back to it keeping the points above in consideration.
inode table's name_hash hash table might be an interesting candidate to turn use linear probing instead of linked lists.
After some investigation I've made related to other issues (#2717 and #3677), I've realized that the way we are using dentries is unsafe.
We don't have a strict way to invalidate inodes and its entries (mostly because invalidations and forgets are sent and executed asynchronously), but we construct paths based on the cached inodes. This basically means that the paths we construct may point to a file that doesn't exist anymore (though the inode may still exist), or even point to a file that has been renamed and it's referencing a completely different inode.
Even if we had a strict way to invalidate inodes, it's still possible to access a deleted file if it's still open, which means that no path we can construct will be valid.
The primary use for dentries is just to reconstruct the path from an inode, using inode_path(), so I would say that what we really need to do is to get rid of the dentries and modify xlators depending on the path so that they don't depend on it (the path may already be incorrect anyway, so it's not correct to use the path). As I see it now, we shouldn't require inode_path() at all. If the kernel sends the path, we already have it. If the kernel doesn't send it, whatever we can reconstruct from an inode is potentially wrong, so we shouldn't require nor use it.
All this is becoming an issue with many nested implications. I'm still trying to understand the scope of the problem, but I'll open a Git Hub issue for this soon. The changes required to fix this (and the related issues) are going to be significant.
Thank you for your contributions. Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity. It will be closed in 2 weeks if no one responds with a comment here.
Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it.