netatalk icon indicating copy to clipboard operation
netatalk copied to clipboard

dircache testsuite tests fail with optimized dircache settings

Open rdmark opened this issue 2 months ago • 6 comments

When running the production container with optimized dircache settings, we get a range of test failures.

The settings are

              -e AFP_DIRCACHESIZE=131072 \
              -e AFP_DIRCACHE_VALIDATION_FREQ=100 \
              -e AFP_DIRCACHE_METADATA_WINDOW=3600 \
              -e AFP_DIRCACHE_METADATA_THRESHOLD=1800 \

More tests fail with sqlite than the other backends, for some reason:

dbd

  Failed tests:
    Dircache:test502: move and rename dir, enumerate renamed dir
    Dircache:test503: move and rename dir, enumerate renamed dir
    Dircache:test504: rename topdir, stat file in subdir of renamed topdir
    Dircache:test505: rename dir, stat subdir in renamed dir
    Dircache:test506: stat subdir in poisoned path
    Error:test174: did error two users from parent folder did=<deleted> name=test174 name

mysql

   Failed tests:
    Dircache:test502: move and rename dir, enumerate renamed dir
    Dircache:test503: move and rename dir, enumerate renamed dir
    Dircache:test504: rename topdir, stat file in subdir of renamed topdir
    Dircache:test505: rename dir, stat subdir in renamed dir
    Dircache:test506: stat subdir in poisoned path
    Error:test174: did error two users from parent folder did=<deleted> name=test174 name

sqlite

  Failed tests:
    Dircache:test500: move and rename dir, enumerate new parent, stat renamed dir
    Dircache:test501: move and rename dir, then stat it
    Dircache:test502: move and rename dir, enumerate renamed dir
    Dircache:test503: move and rename dir, enumerate renamed dir
    Dircache:test504: rename topdir, stat file in subdir of renamed topdir
    Dircache:test505: rename dir, stat subdir in renamed dir
    Dircache:test506: stat subdir in poisoned path
    Error:test174: did error two users from parent folder did=<deleted> name=test174 name

rdmark avatar Nov 13 '25 18:11 rdmark

@andylemin I had the idea of exposing the dircache settings in the Docker container! Are these test failures familiar to you?

Sample debug log.

############## entering test503 ##############
[FPOpenVolFull] Open Vol test1 bitmap 21
[FPCreateDir] Create Directory Vol 256 did : 0x2 <t503 dir>
directory ID 0x11
[FPCreateDir] Create Directory Vol 256 did : 0x11 <t503 subdir1>
directory ID 0x12
[FPCreateDir] Create Directory Vol 256 did : 0x11 <t503 subdir2>
directory ID 0x13
[FPMoveAndRename] Move and rename Vol: 256 did: 0x11 <t503 subdir1> ==> 0x13 <t503 renamedsubdir1>
header.dsi_code       -5018	AFPERR_NOOBJ   
[../test/testsuite/T2_Dircache_attack.c:387] FPMoveAndRename(Conn2, vol2, dir_id, subdir2_id, subdir1, renamedsubdir1)
[FPGetFileDirParams] GetFileDirParams Vol 256 did : 0x12 <>
	FAILED t503 subdir1 should be t503 renamedsubdir1
[FPCloseVol] Close Vol 256
[FPDelete] FPDelete conn 0x7f075f411020 Vol 256 did : 0x12 <>
[FPDelete] FPDelete conn 0x7f075f411020 Vol 256 did : 0x13 <>
[FPDelete] FPDelete conn 0x7f075f411020 Vol 256 did : 0x11 <>
Dircache:test503: move and rename dir, enumerate renamed dir - FAILED

Looks like the FPMoveAndRename command fails to find the target dir for some reason.

Next up is to test this outside of the container.

rdmark avatar Nov 13 '25 18:11 rdmark

It is specifically the validation frequency that causes this failure. If you set it to anything >1 then the tests fail. This work fine for instance.

dircachesize = 131072
dircache validation freq = 1
dircache metadata window = 3600
dircache metadata threshold = 1800

rdmark avatar Nov 14 '25 17:11 rdmark

What all the failing tests have in common is that you execute file system operations with user 1, then immediately does something with the newly created file or dir with user 2.

rdmark avatar Nov 15 '25 16:11 rdmark

Thanks mark. This sounds like some of the APF operations are not updating the dircache as they should. The intent is that all explicit AFP ops should always immediately update the dircache, and only reads should be deterministic.

I remember when making the changes that I found some AFP operations were not actually updating the dircache, and were essentially relying on the fact that the next read would...

It seems like I did not find all of them and some update/write ops still need to be fixed to update the dircache as they should.

So this is probably a bug hidden when setting freq to 1 (always stat and update on read).

I'll have a look this week, as I suspect it is the same one or two incomplete ops impacting all the tests.

Please hold back the release for now until this is fixed 👍

andylemin avatar Nov 16 '25 09:11 andylemin

Cheers, let me know how your investigations go. Fingers crossed that it's a straight-forward fix.

rdmark avatar Nov 16 '25 10:11 rdmark

Also, apologies that I didn't think about running the spectest with dircache optimized settings earlier. It occurred to me as I was running tests in preparation for tagging the release!

rdmark avatar Nov 16 '25 10:11 rdmark