dircache testsuite tests fail with optimized dircache settings
When running the production container with optimized dircache settings, we get a range of test failures.
The settings are
-e AFP_DIRCACHESIZE=131072 \
-e AFP_DIRCACHE_VALIDATION_FREQ=100 \
-e AFP_DIRCACHE_METADATA_WINDOW=3600 \
-e AFP_DIRCACHE_METADATA_THRESHOLD=1800 \
More tests fail with sqlite than the other backends, for some reason:
dbd
Failed tests:
Dircache:test502: move and rename dir, enumerate renamed dir
Dircache:test503: move and rename dir, enumerate renamed dir
Dircache:test504: rename topdir, stat file in subdir of renamed topdir
Dircache:test505: rename dir, stat subdir in renamed dir
Dircache:test506: stat subdir in poisoned path
Error:test174: did error two users from parent folder did=<deleted> name=test174 name
mysql
Failed tests:
Dircache:test502: move and rename dir, enumerate renamed dir
Dircache:test503: move and rename dir, enumerate renamed dir
Dircache:test504: rename topdir, stat file in subdir of renamed topdir
Dircache:test505: rename dir, stat subdir in renamed dir
Dircache:test506: stat subdir in poisoned path
Error:test174: did error two users from parent folder did=<deleted> name=test174 name
sqlite
Failed tests:
Dircache:test500: move and rename dir, enumerate new parent, stat renamed dir
Dircache:test501: move and rename dir, then stat it
Dircache:test502: move and rename dir, enumerate renamed dir
Dircache:test503: move and rename dir, enumerate renamed dir
Dircache:test504: rename topdir, stat file in subdir of renamed topdir
Dircache:test505: rename dir, stat subdir in renamed dir
Dircache:test506: stat subdir in poisoned path
Error:test174: did error two users from parent folder did=<deleted> name=test174 name
@andylemin I had the idea of exposing the dircache settings in the Docker container! Are these test failures familiar to you?
Sample debug log.
############## entering test503 ##############
[FPOpenVolFull] Open Vol test1 bitmap 21
[FPCreateDir] Create Directory Vol 256 did : 0x2 <t503 dir>
directory ID 0x11
[FPCreateDir] Create Directory Vol 256 did : 0x11 <t503 subdir1>
directory ID 0x12
[FPCreateDir] Create Directory Vol 256 did : 0x11 <t503 subdir2>
directory ID 0x13
[FPMoveAndRename] Move and rename Vol: 256 did: 0x11 <t503 subdir1> ==> 0x13 <t503 renamedsubdir1>
header.dsi_code -5018 AFPERR_NOOBJ
[../test/testsuite/T2_Dircache_attack.c:387] FPMoveAndRename(Conn2, vol2, dir_id, subdir2_id, subdir1, renamedsubdir1)
[FPGetFileDirParams] GetFileDirParams Vol 256 did : 0x12 <>
FAILED t503 subdir1 should be t503 renamedsubdir1
[FPCloseVol] Close Vol 256
[FPDelete] FPDelete conn 0x7f075f411020 Vol 256 did : 0x12 <>
[FPDelete] FPDelete conn 0x7f075f411020 Vol 256 did : 0x13 <>
[FPDelete] FPDelete conn 0x7f075f411020 Vol 256 did : 0x11 <>
Dircache:test503: move and rename dir, enumerate renamed dir - FAILED
Looks like the FPMoveAndRename command fails to find the target dir for some reason.
Next up is to test this outside of the container.
It is specifically the validation frequency that causes this failure. If you set it to anything >1 then the tests fail. This work fine for instance.
dircachesize = 131072
dircache validation freq = 1
dircache metadata window = 3600
dircache metadata threshold = 1800
What all the failing tests have in common is that you execute file system operations with user 1, then immediately does something with the newly created file or dir with user 2.
Thanks mark. This sounds like some of the APF operations are not updating the dircache as they should. The intent is that all explicit AFP ops should always immediately update the dircache, and only reads should be deterministic.
I remember when making the changes that I found some AFP operations were not actually updating the dircache, and were essentially relying on the fact that the next read would...
It seems like I did not find all of them and some update/write ops still need to be fixed to update the dircache as they should.
So this is probably a bug hidden when setting freq to 1 (always stat and update on read).
I'll have a look this week, as I suspect it is the same one or two incomplete ops impacting all the tests.
Please hold back the release for now until this is fixed 👍
Cheers, let me know how your investigations go. Fingers crossed that it's a straight-forward fix.
Also, apologies that I didn't think about running the spectest with dircache optimized settings earlier. It occurred to me as I was running tests in preparation for tagging the release!