leofs icon indicating copy to clipboard operation
leofs copied to clipboard

[leo_gateway][nfs] inode(fileid) can be conflict

Open mocchira opened this issue 8 years ago • 6 comments

AFAIK NFS clients have used inode(called fileid in rfc1813) included in a response only as a local cache identifier so that I think it's not big flaw but should be fixed ASAP.

How to fix

64bit UUID on ets with its key(path to file) looks reasonable and promising to me.

mocchira avatar Feb 03 '17 08:02 mocchira

So to solve this, we need to keep track the fileid -> path, if a collision is found we can do double hashing.

As the table could be accessed by multiple process, we may need a gen_server for that? And it doesn't seem to have a good way to tell when the entry could be removed from the table.

windkit avatar Feb 14 '17 07:02 windkit

So to solve this, we need to keep track the fileid -> path, if a collision is found we can do double hashing.

I think the atomic counter on ets(http://erlang.org/doc/man/ets.html#update_counter-3) would be sufficient in this case.

As the table could be accessed by multiple process, we may need a gen_server for that?

The combo the atomic counter described above and the atomic insert(http://erlang.org/doc/man/ets.html#insert_new-2) would allow us to implement multi process safe code without gen_server.

And it doesn't seem to have a good way to tell when the entry could be removed from the table.

I think there is no need to know when the entry was removed from the table. Could you elaborate the actual case?

mocchira avatar Feb 14 '17 08:02 mocchira

First of all, let me confirm some terminology here.

The FileID as a unique identifier is for client to do caching and so on. Therefore, for each file, it should be mapped to one unique ID.

If we use atomic counter, doesn't it mean the same file will map to multiple IDs? Client then cannot do caching and so the performance would drop so significantly.

windkit avatar Feb 15 '17 01:02 windkit

The FileID as a unique identifier is for client to do caching and so on. Therefore, for each file, it should be mapped to one unique ID.

Right.

If we use atomic counter, doesn't it mean the same file will map to multiple IDs? Client then cannot do caching and so the performance would drop so significantly.

Words I replied might be not enough. What I meant is not only merely using atomic counter(AC) but storing the FileID(generated by AC) <-> the file path mapping(index + inverted index) into ets.

mocchira avatar Feb 15 '17 01:02 mocchira

If you are worrying about collision in 64 bit hash value, I guess the memory usage for string <-> integer should also be considered?

That's what I meant with And it doesn't seem to have a good way to tell when the entry could be removed from the table.

Also we need performance check (NFS is slow anyway though...

windkit avatar Feb 20 '17 00:02 windkit

@windkit re-checking the spec (https://tools.ietf.org/html/rfc1813) and searching fileid on the web have revealed the another requirement for fileid.

That MUST be not only unique but keep it the same even after rebooting leo_gateway|remounting nfs so either that can be calculated deterministically based on its path (like the current IMPL using hash) OR having one-to-one mapping for fileid(8byte) <-> path (like the inode number in ext(3|4)).

So the proper solution that satisfies the permanent uniqueness (implying collision free) with 8 bytes sequences seems to have field in the permanent place(AVS).

Since this IMPL not only probably take much time but also take into consideration the distributed UUID, I'd like to postpone for a while so please go forward another task for now.

mocchira avatar Feb 21 '17 07:02 mocchira