lsof
lsof copied to clipboard
Use user mode APIs on FreeBSD
This patchset gets lsof to mainly use user mode APIs on FreeBSD. Kernel access with kvm is now optional although still required for a few features. Many other bugs are fixed in the process, the code is shorter and cleaner, and portability and maintainability greatly improved, as will be described.
lsof is now able to run as an ordinary user, but won't necessarily see all processes and files in that case. All open xfiles system-wide are retrieved with sysctl kern.file, all xvnodes are retrieved with sysctl kern.vnode, all sockets PCBs are retrieved with sysctls such as net.inet.tcp.pcblist, and these are sorted. Processes are retrieved with sysctl kern.proc.proc or kern.proc.all, and each process's files are retrieved with libutil's kinfo_getfile(). Each such kinfo_file binary-searches the xfiles, xvnodes, and PCBs as necessary to stitch together all the data it needs.
Pipes: use kinfo_file, but can use kvm access when available to retrieve fields absent from kinfo_file.
PTYs: 100% converted to use kinfo_file only.
Sockets: 100% converted to use kinfo_file and PCBs only.
Kqueues: needs kvm access as no usermode API seems to exist.
Vnodes, fifos: kvm access is only used for nullfs, and reading lock state.
We now determine the lock state on all filesystems, not just UFS and ZFS. Almost all filesystem-specific functions have been deleted, as we don't need them any more. Filenames come from kinfo_file for all filesystems.
Each process's cwd, root directory, jail directory, text file and controlling TTY are now also shown; this was broken before. The procfs filesystem works now, it was broken since FreeBSD 5.
That terrifying process_node() function has been defeated and reduced from 1000+ LOC to 389, and most of its #ifdef hell is gone.
Performance: care was taken to maintain good performance. All file data searches are binary searches, with an algorithmic complexity of O(n*log(n)) in the number of files. It takes about 200 milliseconds to list 5000 open files. However since FreeBSD only provides data such as PCBs and xfiles in bulk, memory usage may be high: if sizeof(kinfo_file) + sizeof(xfile) + sizeof(PCBs) = 1000 bytes, then listing 1 million open files will require 1 GB of RAM.
Portability: the flagship platform is FreeBSD 13, where development and most testing was done. 14-CURRENT builds cleanly and works well too. 12.0 needed one patch, while 11.2 needed several patches and gives lots of warnings but works. In theory, it should work on older FreeBSD versions, possibly as far back as 7.0, but I haven't tested. Older versions will have less features available, eg. FreeBSD 12 doesn't have socket buffer usage in its kinfo_file, FreeBSD 11 also lacks TCP flags and state, etc.
This approach is essentially the same as what Linux did, replacing its /dev/kmem implementation with /proc traversal (as per 00PORTING).
I've left FIXMEs where future kernel API improvements could lead to more file info or better performance, for example we could avoid a stat() for each vnode if the kernel gave us the link count in kinfo_file.
@emaste can you validate before I merge?
@masatake once this is merged. I'd like to get a release (at least for FreeBSD) out, what's the process here?
I can confirm that my branch builds and works on FreeBSD 9.0 too.
Earlier versions have no hope, as struct kinfo_file only came into existence in FreeBSD 8.0, and had very few fields until it was beefed up by libprocstat changes in 0daf62d9f5f7d2c15d00d71eb519b90516df016f first released in FreeBSD 9.0.
@lrosenman, please read https://github.com/lsof-org/lsof/blob/master/HOW_TO_RELEASE.rst .
I'm using only GNU/Linux. But I can understand this is excellent work. Should we update the major version number to 5 from 4?
@DamjanJovanovic, could you add the entries to 00DIST? User-visible changes should be written in the file. If your name is not in 00CREDITS, could your name to the file?
I'm using only GNU/Linux. But I can understand this is excellent work. Should we update the major version number to 5 from 4?
Thank you. Is there other dialect improvements that would warrant a major version change?
I also spent a long time only using GNU/Linux, and regret not trying FreeBSD sooner.
Is this still being considered? Can this be merged into lsof-org:master?
Seems to work, but I do get an error on 13.0-RELEASE-p4 [email protected]:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64: lsof: WARNING -- reading xvnode list failed: No such file or directory
Everything seems to work as expected. It can see open files.
see also: https://github.com/risner/freebsd_lsof/issues/1
lsof sas-ram-*
lsof: WARNING: compiled for FreeBSD release 13.0-RELEASE-p3; this is 13.0-RELEASE. lsof: WARNING -- reading xvnode list failed: No such file or directory COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME plot 30313 risner 3w VREG 2150950484,3339640353 0 1158 sas-ram.txt
While it generates the warning "reading xvnode list failed: No such file or directory", it does report my open files.
While it generates the warning "reading xvnode list failed: No such file or directory", it does report my open files.
Confirmed. The warning, then a successful listing.
The warning looks like a failed sysctl where the syscall returned ENOENT = "The name array specifies a value that is unknown". It may simply be an errant debugging printf left in the code.
I checked, and it's from a sysctl MIB CTL_KERN/KERN_VNODE returning ENOENT. I don't have enough knowledge to know why the sysctl is failing.
Seems to work, but I do get an error on 13.0-RELEASE-p4 [email protected]:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64: lsof: WARNING -- reading xvnode list failed: No such file or directory
That's expected, as I said in https://github.com/lsof-org/lsof/pull/180#issuecomment-989116780
"The CTL_KERN + KERN_VNODE sysctl needs to work. Without it we cannot access the inode ("NODE" column) nor filter by inode or vnode id, nor determine an open file's filesystem efficiently or sometimes at all (if kinfo_file.kf_path is empty, then statfs() can't work, whereas we can cache results from previous statfs() and match by xvnode.xv_mount). There could be further limitations. This sysctl was turned off in 2003 in commit acb18acfec97aa7fe26ff48f80a5c3f89c9b542d."
That may not all be true, I'll try make a more up-to-date list of the limitations.
I am also working on some FreeBSD kernel patches to extract kqueue data, some more socket and pipe data (all working), and byte range lock data (the hardest and not yet started). After that's merged to FreeBSD, we can use that instead of KVM access on __FreeBSD_version >= something. A few more cleanups and this should be in good shape to merge to master.
I've submitted a patch with the required FreeBSD kernel changes, for features we need to get lsof fully working without KVM access: https://reviews.freebsd.org/D34090
Updates to lsof to use them (on a recent enough __FreeBSD_version) coming soon.
These 14 new commits contain improvements and bug fixes to the networking code, build fixes for FreeBSD 13, complete elimination of struct xvnode, first-time-ever support for the eventfd, shm and procdesc file types, more details for kqueue and pipes without kvm, a bug fix to the pre-existing lock state code and use of a new sysctl to read lock info instead of kvm, no more kvm for nullfs, and no use of kvm at all on recent enough (that is, patched as below) FreeBSD versions.
It builds and works on FreeBSD 9, 13 and 14-CURRENT.
The kernel patches needed for best results (kqueue/pipe details and lock info, w/o kvm access) are: https://reviews.freebsd.org/D34184 https://reviews.freebsd.org/D34323
(It will need tweaks to align the __FreeBSD_version checks once those patches are committed.)
@DamjanJovanovic can you fix the conflicts? This may obviate the need for me to fix the current 4.95.0 port.
Though I don't know well about FreeBSD, I think the version of lsof including this brilliant patch should be 4.96.0 (or 5.0.0). See https://github.com/lsof-org/lsof/blob/master/HOW_TO_RELEASE.rst for releasing.
@masatake I agree -- need to get the conflicts fixed first.
@DamjanJovanovic any help here? I'd REALLY like to merge this.
I believe he is awaiting kernel folks to merge his changes because they didn’t like his initial approach. But that’s my read/interpretation.
https://reviews.freebsd.org/D34184 https://reviews.freebsd.org/D34323
Looks like [email protected] is waiting on @DamjanJovanovic on: https://reviews.freebsd.org/D34184
I've now gotten my branch to apply cleanly to the latest master (via a "git rebase" - sorry to those who forked it).
Also all the required kernel patches have been committed to FreeBSD 14-CURRENT, so on a recent enough CURRENT, kvm access won't be used at all.
A workaround for the KERN_FILE sysctl for FreeBSD 9 has also been added. FreeBSD 9, 13 and 14-CURRENT all work.
I think this is now complete and ready to merge.
Can you look at the port in ports, and tell me what changes are needed there once I merge/release this? @DamjanJovanovic
In my understanding, these are FreeBSD dialect-specific changes.
Could you add [FreeBSD]
as a prefix for the header of the commit logs?
In my understanding, these are FreeBSD dialect-specific changes. Could you add
[FreeBSD]
as a prefix for the header of the commit logs?
Done.
Can you look at the port in ports, and tell me what changes are needed there once I merge/release this? @DamjanJovanovic
I think this can be removed from the Makefile, as we no longer use filesystem-specific code:
# GCC needs -lzfs -lzpool for reasons unknown. If someone can # figure out why, I (ler) am all ears. .if ${CHOSEN_COMPILER_TYPE} == gcc CONFIGURE_ENV+= LSOF_CFGL="-lzfs -lzpool" .endif
Whether we should still need kernel sources is questionable. Pre-CURRENT definitely needs them for the lockf headers, but the build system also wrongly needs them for CURRENT which uses KERN_LOCKF instead.
I'm interested in this. Thank you for your hard work!
So what's the state here? Are there any kernel patches written and waiting for review OR still needed to be written?
This change looks good to my eye and it looks like all the required changes to FreeBSD have landed. have I overlooked something? Is there anything specific FreeBSD kernel folks need to bless to move this forward?
Thanks for sorting all this out.
The change itself is good, but it breaks 13-STABLE, where the required kernel changes have also been merged. The __FreeBSD_version checks for >= 1400062
should also check for >= 1301506 && < 1400000
. 1301506 is the 13-STABLE __FreeBSD_version somewhat after the kernel changes were MFCd.