nfs4j Interoperability for lock implementation with macOS client

Follow for #42. With the latest changes in master branch with state handling (possibly b873c760b3c7a19c82f074f6245720516efd26df) and merged lock support, the simple lock implementation is no longer interoperable with the NFS client on macOS 10.12.

When mounting the volume, an LOCK : NFS4ERR_BAD_STATEID error is thrown.

Feb 10 '17 10:02 dkocher

I can reproduce the issue. The client/server interaction looks like:

 -> OPEN file1, open_owner1
 <- OPEN fh1, open_stateid1
 -> OPEN_CONFIRM open_stateid1
 <- OPEN_CONFIRM OK
 -> LOCK fh1, new lock owner: true, open_owner1, open_stateid1
 <- LOCK lock_stateid1
 -> LOCKU lock_stateid1
 <- LOCKU OK
 -> CLOSE fh1, open_stateid1
 <- CLOSE OK

so far so good. But then:

 -> OPEN file1, open_owner1
 <- OPEN fh1, open_stateid2
 -> OPEN_CONFIRM open_stateid2
 <- OPEN_CONFIRM OK
 -> LOCK fh1, new lock owner: false, lock_stateid1
 <- LOCK BAD_STATEID

capture file

IOW, client send second LOCK request, but uses lock_stateid1. My reading of https://tools.ietf.org/html/rfc7530#section-16.2.5 tells me, that CLOSE invalidates all locking stats, e.q. old lock_state1 is invalid.

I ask IETF-nfs4 working group to comment on it.

Feb 10 '17 14:02 kofemann

@kofemann Can I be of any help to advance the resolution of this issue?

Mar 13 '17 18:03 dkocher

@dkocher I have discussed the locking issue with NFS community and conclusion is that osx client is broken. However, I understand, that waiting for fix from apple is not an option. I will check how to a workaround to make osx client happy.

Mar 16 '17 09:03 kofemann

@kofemann Let me know if I can be of any help.

Apr 11 '17 14:04 dkocher

@dkocher Thanks for the offer. I need to find out a way how to keep expired lock-owners andl treat them as valid. This is orthogonal to the state handling implementation we have in place. But I did not forget this issue. Thinking some times takes longer time than implementing.....

Apr 12 '17 15:04 kofemann

@dkocher a shot update on this issue.

bad new: it looks like we can't fix it on the server side as by receiving invalid state id on lock server has no sufficient information to 'guess' file's open state.

good news: looks like with latest osx update client recovers from the locking issue by itself. If re-sends open+lock with a new lock owner.

I have sent some traces and described the problem to apple. Let see what will happen. here is a simple python code to trigger the error:

import sys
import os
import fcntl

if len(sys.argv) != 2:
  print('Usage: locktest <file>')
  sys.exit(1)

fname = sys.argv[1]  

f = os.open(fname, os.O_RDONLY)
fcntl.lockf(f, fcntl.LOCK_SH, 0, 100000000)
fcntl.lockf(f, fcntl.LOCK_UN, 0, 100000000)
os.close(f)

f = os.open(fname, os.O_RDONLY)
fcntl.lockf(f, fcntl.LOCK_SH, 0, 100000000)
fcntl.lockf(f, fcntl.LOCK_UN, 0, 100000000)
os.close(f)

Apr 27 '17 07:04 kofemann

This ist still a blocker issue. I have tested with macOS 10.13 and applications that depend on locks fail writing to the volume and do not recover.

Reverting aeb75448a5c2085e6fec38924e706bb2d71ac65a would solve the error. Is there any option to introduce some compatibility mode (shudder)?

Nov 24 '17 13:11 dkocher

Well, technically, you suggest to keep old lock state-id around to identify lock owner by invalidated state-id. This will introduce state-id leak and eventually you will hit OOM.

Unfortunately osx developers completely ignore broken NFS client. As a workaround, we can put them into some-kind fixed size cache and remove them as needed. Ugly, but can work.

Nov 27 '17 14:11 kofemann

Even if fixed in later macOS versions we usually want to support two or more previous major versions. It would be awesome if you could introduce a map that is cleanup up using some deferred algorithm.

Nov 27 '17 21:11 dkocher

nfs4j nfs4j copied to clipboard

Interoperability for lock implementation with macOS client

nfs4j
nfs4j copied to clipboard