zfs icon indicating copy to clipboard operation
zfs copied to clipboard

Change rangelock handling in FreeBSD's zfs_getpages()

Open markjdb opened this issue 1 year ago • 2 comments

Unconditionally hold the rangelock in zfs_getpages().

Motivation and Context

This is reportedly required for direct I/O support on FreeBSD.

Description

This change modifies zfs_getpages() to uncondtionally acquire the rangelock. To avoid a deadlock, we may need to drop a page busy lock and allocate a new page later on.

How Has This Been Tested?

Smoke testing on FreeBSD.

Types of changes

  • [x] Bug fix (non-breaking change which fixes an issue)
  • [ ] New feature (non-breaking change which adds functionality)
  • [ ] Performance enhancement (non-breaking change which improves efficiency)
  • [ ] Code cleanup (non-breaking change which makes code smaller or more readable)
  • [ ] Breaking change (fix or feature that would cause existing functionality to change)
  • [ ] Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • [ ] Documentation (a change to man pages or other documentation)

Checklist:

  • [x] My code follows the OpenZFS code style requirements.
  • [ ] I have updated the documentation accordingly.
  • [ ] I have read the contributing document.
  • [ ] I have added tests to cover my changes.
  • [ ] I have run the ZFS Test Suite with this change applied.
  • [ ] All commit messages are properly formatted and contain Signed-off-by.

markjdb avatar Oct 13 '24 16:10 markjdb

I'm not familiar with the FreeBSD VM layer but it looks like the mmap_mixed test case manage to trip an assert.

From the "vm2: serial console" log, https://github.com/openzfs/zfs/actions/runs/11315921698/job/31476777580?pr=16643

  ZTS run /usr/local/share/zfs/zfs-tests/tests/functional/mmap/mmap_mixed
panic: VERIFY(vm_page_none_valid(m)) failed
  
  cpuid = 1
  time = 1728878112
  KDB: stack backtrace:
  #0 0xffffffff80b9002d at kdb_backtrace+0x5d
  #1 0xffffffff80b43132 at vpanic+0x132
  #2 0xffffffff82841b6a at spl_panic+0x3a
  #3 0xffffffff8284845b at dmu_read_pages+0x7fb
  #4 0xffffffff82865527 at zfs_freebsd_getpages+0x2e7
  #5 0xffffffff80eddf81 at vnode_pager_getpages+0x41
  #6 0xffffffff80ed45c2 at vm_pager_get_pages+0x22
  #7 0xffffffff80eb18df at vm_fault+0x5ef
  #8 0xffffffff80eb11db at vm_fault_trap+0x6b
  #9 0xffffffff8100ca39 at trap_pfault+0x1d9
  #10 0xffffffff8100bfb2 at trap+0x442
  #11 0xffffffff80fe3828 at calltrap+0x8

behlendorf avatar Oct 14 '24 16:10 behlendorf

I'm not familiar with the FreeBSD VM layer but it looks like the mmap_mixed test case manage to trip an assert.

From the "vm2: serial console" log, https://github.com/openzfs/zfs/actions/runs/11315921698/job/31476777580?pr=16643

  ZTS run /usr/local/share/zfs/zfs-tests/tests/functional/mmap/mmap_mixed
panic: VERIFY(vm_page_none_valid(m)) failed
  
  cpuid = 1
  time = 1728878112
  KDB: stack backtrace:
  #0 0xffffffff80b9002d at kdb_backtrace+0x5d
  #1 0xffffffff80b43132 at vpanic+0x132
  #2 0xffffffff82841b6a at spl_panic+0x3a
  #3 0xffffffff8284845b at dmu_read_pages+0x7fb
  #4 0xffffffff82865527 at zfs_freebsd_getpages+0x2e7
  #5 0xffffffff80eddf81 at vnode_pager_getpages+0x41
  #6 0xffffffff80ed45c2 at vm_pager_get_pages+0x22
  #7 0xffffffff80eb18df at vm_fault+0x5ef
  #8 0xffffffff80eb11db at vm_fault_trap+0x6b
  #9 0xffffffff8100ca39 at trap_pfault+0x1d9
  #10 0xffffffff8100bfb2 at trap+0x442
  #11 0xffffffff80fe3828 at calltrap+0x8

Thanks, I think I see what's going on there. I didn't see that panic when I ran the ZTS locally, but if I understand the problem correctly, it's due to a race.

markjdb avatar Oct 14 '24 17:10 markjdb

I'm not familiar with the FreeBSD VM layer but it looks like the mmap_mixed test case manage to trip an assert. From the "vm2: serial console" log, https://github.com/openzfs/zfs/actions/runs/11315921698/job/31476777580?pr=16643

  ZTS run /usr/local/share/zfs/zfs-tests/tests/functional/mmap/mmap_mixed
panic: VERIFY(vm_page_none_valid(m)) failed
  
  cpuid = 1
  time = 1728878112
  KDB: stack backtrace:
  #0 0xffffffff80b9002d at kdb_backtrace+0x5d
  #1 0xffffffff80b43132 at vpanic+0x132
  #2 0xffffffff82841b6a at spl_panic+0x3a
  #3 0xffffffff8284845b at dmu_read_pages+0x7fb
  #4 0xffffffff82865527 at zfs_freebsd_getpages+0x2e7
  #5 0xffffffff80eddf81 at vnode_pager_getpages+0x41
  #6 0xffffffff80ed45c2 at vm_pager_get_pages+0x22
  #7 0xffffffff80eb18df at vm_fault+0x5ef
  #8 0xffffffff80eb11db at vm_fault_trap+0x6b
  #9 0xffffffff8100ca39 at trap_pfault+0x1d9
  #10 0xffffffff8100bfb2 at trap+0x442
  #11 0xffffffff80fe3828 at calltrap+0x8

Thanks, I think I see what's going on there. I didn't see that panic when I ran the ZTS locally, but if I understand the problem correctly, it's due to a race.

I believe the latest version addresses this, and I didn't observe any panics while running the ZTS locally.

markjdb avatar Nov 08 '24 13:11 markjdb