lsof icon indicating copy to clipboard operation
lsof copied to clipboard

Use user mode APIs on FreeBSD

Open DamjanJovanovic opened this issue 3 years ago • 27 comments

This patchset gets lsof to mainly use user mode APIs on FreeBSD. Kernel access with kvm is now optional although still required for a few features. Many other bugs are fixed in the process, the code is shorter and cleaner, and portability and maintainability greatly improved, as will be described.

lsof is now able to run as an ordinary user, but won't necessarily see all processes and files in that case. All open xfiles system-wide are retrieved with sysctl kern.file, all xvnodes are retrieved with sysctl kern.vnode, all sockets PCBs are retrieved with sysctls such as net.inet.tcp.pcblist, and these are sorted. Processes are retrieved with sysctl kern.proc.proc or kern.proc.all, and each process's files are retrieved with libutil's kinfo_getfile(). Each such kinfo_file binary-searches the xfiles, xvnodes, and PCBs as necessary to stitch together all the data it needs.

Pipes: use kinfo_file, but can use kvm access when available to retrieve fields absent from kinfo_file.

PTYs: 100% converted to use kinfo_file only.

Sockets: 100% converted to use kinfo_file and PCBs only.

Kqueues: needs kvm access as no usermode API seems to exist.

Vnodes, fifos: kvm access is only used for nullfs, and reading lock state.

We now determine the lock state on all filesystems, not just UFS and ZFS. Almost all filesystem-specific functions have been deleted, as we don't need them any more. Filenames come from kinfo_file for all filesystems.

Each process's cwd, root directory, jail directory, text file and controlling TTY are now also shown; this was broken before. The procfs filesystem works now, it was broken since FreeBSD 5.

That terrifying process_node() function has been defeated and reduced from 1000+ LOC to 389, and most of its #ifdef hell is gone.

Performance: care was taken to maintain good performance. All file data searches are binary searches, with an algorithmic complexity of O(n*log(n)) in the number of files. It takes about 200 milliseconds to list 5000 open files. However since FreeBSD only provides data such as PCBs and xfiles in bulk, memory usage may be high: if sizeof(kinfo_file) + sizeof(xfile) + sizeof(PCBs) = 1000 bytes, then listing 1 million open files will require 1 GB of RAM.

Portability: the flagship platform is FreeBSD 13, where development and most testing was done. 14-CURRENT builds cleanly and works well too. 12.0 needed one patch, while 11.2 needed several patches and gives lots of warnings but works. In theory, it should work on older FreeBSD versions, possibly as far back as 7.0, but I haven't tested. Older versions will have less features available, eg. FreeBSD 12 doesn't have socket buffer usage in its kinfo_file, FreeBSD 11 also lacks TCP flags and state, etc.

This approach is essentially the same as what Linux did, replacing its /dev/kmem implementation with /proc traversal (as per 00PORTING).

I've left FIXMEs where future kernel API improvements could lead to more file info or better performance, for example we could avoid a stat() for each vnode if the kernel gave us the link count in kinfo_file.

DamjanJovanovic avatar Dec 18 '21 14:12 DamjanJovanovic

@emaste can you validate before I merge?

@masatake once this is merged. I'd like to get a release (at least for FreeBSD) out, what's the process here?

lrosenman avatar Dec 18 '21 16:12 lrosenman

I can confirm that my branch builds and works on FreeBSD 9.0 too.

Earlier versions have no hope, as struct kinfo_file only came into existence in FreeBSD 8.0, and had very few fields until it was beefed up by libprocstat changes in 0daf62d9f5f7d2c15d00d71eb519b90516df016f first released in FreeBSD 9.0.

DamjanJovanovic avatar Dec 18 '21 18:12 DamjanJovanovic

@lrosenman, please read https://github.com/lsof-org/lsof/blob/master/HOW_TO_RELEASE.rst .

masatake avatar Dec 18 '21 22:12 masatake

I'm using only GNU/Linux. But I can understand this is excellent work. Should we update the major version number to 5 from 4?

masatake avatar Dec 18 '21 22:12 masatake

@DamjanJovanovic, could you add the entries to 00DIST? User-visible changes should be written in the file. If your name is not in 00CREDITS, could your name to the file?

masatake avatar Dec 18 '21 22:12 masatake

I'm using only GNU/Linux. But I can understand this is excellent work. Should we update the major version number to 5 from 4?

Thank you. Is there other dialect improvements that would warrant a major version change?

I also spent a long time only using GNU/Linux, and regret not trying FreeBSD sooner.

DamjanJovanovic avatar Dec 19 '21 12:12 DamjanJovanovic

Is this still being considered? Can this be merged into lsof-org:master?

risner avatar Jan 25 '22 15:01 risner

Seems to work, but I do get an error on 13.0-RELEASE-p4 [email protected]:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64: lsof: WARNING -- reading xvnode list failed: No such file or directory

Everything seems to work as expected. It can see open files.

risner avatar Jan 25 '22 19:01 risner

see also: https://github.com/risner/freebsd_lsof/issues/1

lrosenman avatar Jan 26 '22 00:01 lrosenman

lsof sas-ram-*

lsof: WARNING: compiled for FreeBSD release 13.0-RELEASE-p3; this is 13.0-RELEASE. lsof: WARNING -- reading xvnode list failed: No such file or directory COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME plot 30313 risner 3w VREG 2150950484,3339640353 0 1158 sas-ram.txt

While it generates the warning "reading xvnode list failed: No such file or directory", it does report my open files.

risner avatar Jan 26 '22 01:01 risner

While it generates the warning "reading xvnode list failed: No such file or directory", it does report my open files.

Confirmed. The warning, then a successful listing.

grahamperrin avatar Jan 26 '22 07:01 grahamperrin

The warning looks like a failed sysctl where the syscall returned ENOENT = "The name array specifies a value that is unknown". It may simply be an errant debugging printf left in the code.

I checked, and it's from a sysctl MIB CTL_KERN/KERN_VNODE returning ENOENT. I don't have enough knowledge to know why the sysctl is failing.

risner avatar Jan 26 '22 13:01 risner

Seems to work, but I do get an error on 13.0-RELEASE-p4 [email protected]:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64: lsof: WARNING -- reading xvnode list failed: No such file or directory

That's expected, as I said in https://github.com/lsof-org/lsof/pull/180#issuecomment-989116780

"The CTL_KERN + KERN_VNODE sysctl needs to work. Without it we cannot access the inode ("NODE" column) nor filter by inode or vnode id, nor determine an open file's filesystem efficiently or sometimes at all (if kinfo_file.kf_path is empty, then statfs() can't work, whereas we can cache results from previous statfs() and match by xvnode.xv_mount). There could be further limitations. This sysctl was turned off in 2003 in commit acb18acfec97aa7fe26ff48f80a5c3f89c9b542d."

That may not all be true, I'll try make a more up-to-date list of the limitations.

I am also working on some FreeBSD kernel patches to extract kqueue data, some more socket and pipe data (all working), and byte range lock data (the hardest and not yet started). After that's merged to FreeBSD, we can use that instead of KVM access on __FreeBSD_version >= something. A few more cleanups and this should be in good shape to merge to master.

DamjanJovanovic avatar Jan 27 '22 19:01 DamjanJovanovic

I've submitted a patch with the required FreeBSD kernel changes, for features we need to get lsof fully working without KVM access: https://reviews.freebsd.org/D34090

Updates to lsof to use them (on a recent enough __FreeBSD_version) coming soon.

DamjanJovanovic avatar Jan 29 '22 10:01 DamjanJovanovic

These 14 new commits contain improvements and bug fixes to the networking code, build fixes for FreeBSD 13, complete elimination of struct xvnode, first-time-ever support for the eventfd, shm and procdesc file types, more details for kqueue and pipes without kvm, a bug fix to the pre-existing lock state code and use of a new sysctl to read lock info instead of kvm, no more kvm for nullfs, and no use of kvm at all on recent enough (that is, patched as below) FreeBSD versions.

It builds and works on FreeBSD 9, 13 and 14-CURRENT.

The kernel patches needed for best results (kqueue/pipe details and lock info, w/o kvm access) are: https://reviews.freebsd.org/D34184 https://reviews.freebsd.org/D34323

(It will need tweaks to align the __FreeBSD_version checks once those patches are committed.)

DamjanJovanovic avatar Feb 21 '22 17:02 DamjanJovanovic

@DamjanJovanovic can you fix the conflicts? This may obviate the need for me to fix the current 4.95.0 port.

lrosenman avatar May 01 '22 21:05 lrosenman

Though I don't know well about FreeBSD, I think the version of lsof including this brilliant patch should be 4.96.0 (or 5.0.0). See https://github.com/lsof-org/lsof/blob/master/HOW_TO_RELEASE.rst for releasing.

masatake avatar May 01 '22 21:05 masatake

@masatake I agree -- need to get the conflicts fixed first.

lrosenman avatar May 01 '22 21:05 lrosenman

@DamjanJovanovic any help here? I'd REALLY like to merge this.

lrosenman avatar May 07 '22 19:05 lrosenman

I believe he is awaiting kernel folks to merge his changes because they didn’t like his initial approach. But that’s my read/interpretation.

https://reviews.freebsd.org/D34184 https://reviews.freebsd.org/D34323

risner avatar May 07 '22 21:05 risner

Looks like [email protected] is waiting on @DamjanJovanovic on: https://reviews.freebsd.org/D34184

lrosenman avatar May 07 '22 21:05 lrosenman

I've now gotten my branch to apply cleanly to the latest master (via a "git rebase" - sorry to those who forked it).

Also all the required kernel patches have been committed to FreeBSD 14-CURRENT, so on a recent enough CURRENT, kvm access won't be used at all.

A workaround for the KERN_FILE sysctl for FreeBSD 9 has also been added. FreeBSD 9, 13 and 14-CURRENT all work.

I think this is now complete and ready to merge.

DamjanJovanovic avatar Jun 23 '22 16:06 DamjanJovanovic

Can you look at the port in ports, and tell me what changes are needed there once I merge/release this? @DamjanJovanovic

lrosenman avatar Jun 23 '22 16:06 lrosenman

7955db2 e20df05 b34044a

In my understanding, these are FreeBSD dialect-specific changes. Could you add [FreeBSD] as a prefix for the header of the commit logs?

masatake avatar Jun 23 '22 16:06 masatake

7955db2 e20df05 b34044a

In my understanding, these are FreeBSD dialect-specific changes. Could you add [FreeBSD] as a prefix for the header of the commit logs?

Done.

DamjanJovanovic avatar Jun 23 '22 17:06 DamjanJovanovic

Can you look at the port in ports, and tell me what changes are needed there once I merge/release this? @DamjanJovanovic

I think this can be removed from the Makefile, as we no longer use filesystem-specific code:

# GCC needs -lzfs -lzpool for reasons unknown.  If someone can
# figure out why, I (ler) am all ears.
.if ${CHOSEN_COMPILER_TYPE} == gcc
CONFIGURE_ENV+= LSOF_CFGL="-lzfs -lzpool"
.endif

Whether we should still need kernel sources is questionable. Pre-CURRENT definitely needs them for the lockf headers, but the build system also wrongly needs them for CURRENT which uses KERN_LOCKF instead.

DamjanJovanovic avatar Jun 23 '22 17:06 DamjanJovanovic

I'm interested in this. Thank you for your hard work!

hhartzer avatar Jun 28 '22 00:06 hhartzer

So what's the state here? Are there any kernel patches written and waiting for review OR still needed to be written?

mjguzik avatar Sep 16 '22 16:09 mjguzik

This change looks good to my eye and it looks like all the required changes to FreeBSD have landed. have I overlooked something? Is there anything specific FreeBSD kernel folks need to bless to move this forward?

Thanks for sorting all this out.

bsdimp avatar Sep 16 '22 17:09 bsdimp

The change itself is good, but it breaks 13-STABLE, where the required kernel changes have also been merged. The __FreeBSD_version checks for >= 1400062 should also check for >= 1301506 && < 1400000. 1301506 is the 13-STABLE __FreeBSD_version somewhat after the kernel changes were MFCd.

dankm avatar Sep 16 '22 20:09 dankm