Unable to specify files inode
This code https://github.com/dokan-dev/dokan-dotnet/blob/master/DokanNet/DokanOperationProxy.cs#L507 Causes the inode for two completly different files to be the same if they just have the same filename.
I'm running Python scripts that copy files around on my drive.
And python's os.path.samefile checks if the file's inode and deviceID are different. Which they aren't if the files happen to have the same name.
In my case I am trying to copy a file to a different directory. So it indeed has the same name and thus the same inode. But they are still seperate files.
My proposed solution would be to provide the ability to set the inode in the FileInformation struct but keep it optional.
What could be done is have JUST GetFileInformation return a different FileInformation struct than everywhere else which contains the full path instead of just the filename.
In my case I'm storing the FileInformation struct directly in my file node because I don't need to generate a new one everytime.
Instead of
public NtStatus GetFileInformation(string filename, out FileInformation fileInfo, DokanFileInfo info)
{
var node = GetNode(filename, info);
if (node == null)
{
fileInfo = new FileInformation();
return DokanResult.FileNotFound;
}
fileInfo = node.fileInfo;
return DokanResult.Success;
}
I would need to
public NtStatus GetFileInformation(string filename, out FileInformation fileInfo, DokanFileInfo info)
{
var node = GetNode(filename, info);
if (node == null)
{
fileInfo = new FileInformation();
return DokanResult.FileNotFound;
}
fileInfo = new FileInformation
{
FileName = filename,
Attributes = node.fileInfo.Attributes,
CreationTime = node.fileInfo.CreationTime,
LastAccessTime = node.fileInfo.LastAccessTime,
LastWriteTime = node.fileInfo.LastWriteTime,
Length = node.fileInfo.Length
};
return DokanResult.Success;
}
But that would make GetFileInformation from now on always return a wrong FileName.
Which is btw what the Mirror sample is already doing.
Just looked over it and found that FileName in FileInformation at that place isn't needed at all. It's just used to generate the hash and nothing else. I couldn't find a mention of that in the documentation. That's a bit unintuitive.
The same entry in the same struct stands for FileName everywhere else. And is also called FileName.
Just in this case it has to be full path although it should really just be a number.
Hi @dedmen ,
GetFileInformation is using FileInformation in C# but in reality the GetFileInformation wrap GetFileInformationByHandle win32 https://docs.microsoft.com/en-us/windows/desktop/api/fileapi/nf-fileapi-getfileinformationbyhandle
that has no information of the filename https://docs.microsoft.com/en-gb/windows/desktop/api/fileapi/ns-fileapi-_by_handle_file_information
Which description do you think you would have like here ?
I'd say use a new struct that has the actual INODE handle as a number instead of FileName.
That would also allow to expose nFileIndexHigh and dwNumberOfLinks to the user if they need it.
It definetly should be a seperate type as the one used in FindFiles though. Developers first assume same type == same thing. Which is not the case here.
That change would be a API breaking change.
Hi @dedmen ,
I agree with your proposition. Do you think you could provide a pull request with the changes ? This would be greatly appreciated 👍
Yeah sure. Probably next weekend
@dedmen I imagine next weekend is far enough now 😄 just wanted to know if it is really out of your time ?
Sorry, I spent most of my time on other projects as the dokan thing was working well enough at that point. Its currently using my dirty workaround https://github.com/arma3/DokanPbo/blob/master/DokanPbo.Core/PboFS.cs#L368
This is still on my todo list, but not a high priority sadly. Too much other stuff going on.
No worry 👍 Thank you for the update and the workaround that might help others !
For reference (as I already forgot most of the stuff I was looking at back then)
Hash is calculated and used here: https://github.com/dokan-dev/dokan-dotnet/blob/fa556dfc3631eb8feed374322f4285cf17194f88/DokanNet/DokanOperationProxy.cs#L547
This
https://github.com/dokan-dev/dokan-dotnet/blob/fa556dfc3631eb8feed374322f4285cf17194f88/DokanNet/DokanOperationProxy.cs#L515
should really just return a struct mirroring the BY_HANDLE_FILE_INFORMATION contents in a C# format, we can basically keep most things but just add 64bit(?) number for fileIndex (inode) and dwNumberOfLinks
We need to communicate a way to the users to still let them generate hash from filename, to make sure new users won't be confused?
Might aswell just mention a //(uint)(fi.FileName?.GetHashCode() ?? 0) in comments in the new struct.
Exactly, we should use another struct having FileIndex that user can set OR let equal to 0 and dokan-dotnet handle it by keeping a hashmap updated with FileIndex of each file and cleanup when all handles are closed.
There is no real rules for FileIndex even if the best would be to duplicate NTFS https://docs.microsoft.com/en-us/windows/win32/api/fileapi/ns-fileapi-by_handle_file_information#remarks
@dedmen just to come back at what you said. When you copied the file to another folder you said os.path.samefile return true as they have the same name and therefore the same HashCode but if FileName is the full path \folder1\file and \folder2\file the HashCode should be different ? I might have missed something here.
but if FileName is the full path
correct. Full path would work. As thats my workaround. Though in other places where the same FileInformation struct is used, its just name, not full path.
I might have missed something. For me GetFileInformation provide the full path so it OK.
What are the other places where FileInformation is used only with the file name ?
public NtStatus FindFiles(string filename, out IList<FileInformation> files, DokanFileInfo info) public NtStatus FindStreams(string fileName, out IList<FileInformation> streams, DokanFileInfo info) public NtStatus FindFilesWithPattern(string fileName, string searchPattern, out IList<FileInformation> files, DokanFileInfo info)
Atleast I think thats the case. FindFiles in C API uses PWIN32_FIND_DATAW
https://docs.microsoft.com/en-us/windows/win32/api/minwinbase/ns-minwinbase-win32_find_dataw
Which contains filename according to MSDN, MSDN usually says "path" when they mean a path.
https://github.com/dokan-dev/dokan-dotnet/blob/fa556dfc3631eb8feed374322f4285cf17194f88/DokanNet/DokanOperationProxy.cs#L681
On the other hand all the dokan methods take a "filename" as argument which is really a fullpath.
FindStreams returns https://docs.microsoft.com/en-us/windows/win32/api/fileapi/ns-fileapi-win32_find_stream_data which has a streamname, not a filename, and also a streamsize. I think that API should also be changed to a seperate struct. It only uses "Filename" which is supposed to be a "SteamName" and "Length" which is supposed to be the length of the stream and ignores all the other entries in the FileInformation struct.
So I guess I also kinda missinterpreted the API due to the mix of filename vs fullpath. Or atleast didn't interpret the API in the way it was supposed to be interpreted.
https://dokan-dev.github.io/dokan-dotnet-doc/html/struct_dokan_net_1_1_file_information.html This also says
the name of the file or directory.
I agree, the documentation should be improved ~~or the API that is misleading~~ (win32 API also use FileName whatever file or path dir is used). @dedmen https://github.com/dokan-dev/dokan-dotnet/commit/3b70a574491817b85aa34e018197d3ca6fdf05ba does this would be enough ?
Regarding the main reason of this issue, we agree that only GetFileInformation is using the FileIndex so we have no bug over the usage of (uint)(fi.FileName?.GetHashCode() ?? 0).
Can we agree to focus on changing the API ? We could just do struct inheritance as they look the same and change the API to use, as you proposed, the correct member names.
does this would be enough ?
Technically yes.. but it's more a hacky workaround than a real solution. And it requires the user to read the documentation instead of just relying on the filename. And then we have still the issue that the user used filename for the other variants and it worked fine, so he assumes he has to do the same in GetFileInformation too.
so we have no bug over the usage of (uint)(fi.FileName?.GetHashCode() ?? 0).
We only let the user access 32bit, of the 64bit value. User might want access the rest too.
Can we agree to focus on changing the API ?
As I said, I would just split the "FileInformation" struct, into 3 (FindFiles, FindStreams, GetFileInformation) seperate structs, that expose exactly the value that are actually needed.
https://github.com/dokan-dev/dokan-dotnet/blob/3b70a574491817b85aa34e018197d3ca6fdf05ba/DokanNet/FileInformation.cs#L10-L13
FindFiles (both variants) needs to contain everything that FileInformation had. And we can keep "FileName" as that matches Win32 API.
FindStreams needs https://github.com/dokan-dev/dokan-dotnet/blob/fa556dfc3631eb8feed374322f4285cf17194f88/DokanNet/DokanOperationProxy.cs#L749-L753
string SteamName;
long StreamSize;
GetFileInformation needs https://github.com/dokan-dev/dokan-dotnet/blob/fa556dfc3631eb8feed374322f4285cf17194f88/DokanNet/DokanOperationProxy.cs#L505
string FilePath; //Only used to create FileIndex if FileIndex wasn't set (needs to be documented)
FileAttributes Attributes;
DateTime? CreationTime;
DateTime? LastAccessTime;
DateTime? LastWriteTime;
long FileSize;
long? FileIndex; //If this is set, filepath is ignored (needs to be documented)
int NumberOfLinks = 1; //Not sure if we want to expose that? Don't know what its used for
We could do inheritance and just define properly named get/set and pass the same base-struct in the backend if that makes things easier, but I wouldn't know why also I'd prefer not to allocate/copy large structs, where 80% of the content is unused.
@dedmen I fully agree on changing the API to use the struct inheritance. Do you think you would be able to provide a PR for it ?
Able of course :D Can't give you a timespan. Do you have a deadline it must be ready by?
@dedmen Great ! :) No deadline at all, of course (this is opensource :D) ! SI'm sorry if I sounded like I was pushing ^^
...this is an entire thing, just because Python's os.path.samefile has a bug?
Windows simply doesn't have the concept of an inode, nor a native concept of two directory entries that directly point to the same file (clusters). As such, on Windows NTFS, there is no possible way that two named regular files (i.e., files that are not reparse points) can ever possibly be the same underlying file object. The only instances where they could appear to be are under the old Single Instance Storage driver (which was a thing on Windows 2000 Server's Remote Installation Service), which relied on reparse points... and symlinks or hardlinks, which are also implemented as reparse points. (Notably, neither the SIS nor symlink/hardlink functionality ever require the names to match across all locations, so Python's os.path.samefile would give false negatives as well as false positives if it's implemented as described by @dedmen.)
It doesn't make sense to try to address that bug here. It's a bug in Python that Python needs to deal with.
Also, my thinking is that the most appropriate mapping of the inode concept to Windows would be through the "object ID" mechanism (ntifs.h:NtCreateFile's CreateOptions parameter containing the bit FILE_OPEN_BY_FILE_ID, which is exposed to usermode by winbase.h:OpenFileById). Using it would require several fcntls that Dokany doesn't support, and which would require some kind of additional mapping to user mode functions to manage.
This is not only a thing because of that python bug. I'd like to specify the id by myself too, and sending a string who'se whole purpose is to generate a ID is counterintuitive.
I have my own reasons for wanting to specify the ID, for FILE_OPEN_BY_FILE_ID handling. I honestly think it'd be most appropriate to address this bug by implementing that.
That said... all filenames that come to the functions pointed to in DOKAN_OPERATIONS from dokan1.sys are (currently) passed the full pathname within the filesystem. There's no "look for current working directory and append the filename to it" that the filesystem implementation needs to do, it's all handled by the driver. Which is good, because the driver doesn't pass current working directory information to the implementation. I don't think the driver can actually even get the current working directory information of the process that called it.
To understand BY_HANDLE_FILE_INFORMATION and its use of nFileIndexLow and nFileIndexHigh, it's important to realize that those came directly from the definition in the NT header files. Those fields are expressly defined and populated to support FILE_OPEN_BY_FILE_ID. The fact that you want to specify them is awesome, and I wholeheartedly salute it -- I just want it done correctly.
If FILE_OPEN_BY_FILE_ID is handled, the file ID is passed to the kernel in raw form (64 raw bits, with no base64 encoding, no padding, no NUL-termination) as FileName. Trying to interpret it as a string will either lead to incorrect behavior (since the first four bytes of a 32 bit value stuffed into a 64-bit space matches the UTF-32 representation of U+0000, thus leading to a string that's empty) or a bluescreen (since a 64-bit number with no 0x00 octets would have no guarantee that it would be followed by U+0000 in any reasonable space, and the kernel driver would need to interpret it as a string to determine what filename to pass up to user mode) -- see https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-ntcreatefile for more information. But, since the kernel driver would also receive the FILE_OPEN_BY_FILE_ID flag, it would be easy enough for the driver to break the value out into the nFileIndexLow and nFileIndexHigh fields, and pass an empty string in FileName. (This would be the differentiation that would allow a user to know that it's being requested to open a file by ID instead of by name -- if the string is @"" and not even @"\", it's not referring to a normal name.) [note: the kernel doesn't allow FILE_OPEN_BY_FILE_ID unless the filesystem is mounted with the FILE_SUPPORTS_OPEN_BY_FILE_ID deviceCharacteristic. This would need to be added to the DokanOptions enum, and its underlying handlers, to let the implementation tell dokany that it won't panic if it receives String.Empty in CreateFile.]
As an implementation note on your patch, long? FileIndex actually won't work to make it nullable, because long is a value type. Because of the underlying semantics of how memory addresses are interpreted, If 'null' were passed there, it would simply be passed the value of 0L, with no way to differentiate whether it's intended to be null or a literal 0L.
I'd also suggest making that parameter ulong instead of long, since it's not actually interpreted as an numeric value, and the underlying nFileIndexLow and nFileIndexHigh are both defined as uint (which would have the potential of making fully half of the nFileIndexHigh values appear to be negative numbers -- as well as potentially opening up sign-extension issues).
I understand that you probably derived the name FileIndex as a name from BY_HANDLE_FILE_INFORMATION.cs, but I'd suggest renaming it to FileID", since FILE_ID is what the MSDN documentation calls it. (The names nFileIndexLow and nFileIndexHigh are artifacts of how NTFS implements it -- it's the index of the file into the Master File Table. However, in dokany implementations, they're not always going to be true "indexes". I'm working on an implementation that would be able to directly expose underlying object IDs for file streams stored in PostgreSQL, which doesn't relate to indexing into an MFT. [As a side note, Postgres object IDs are all 32 bits long, which would also lead to me being impacted by a 0x00000000 nFileIndexHigh if it were interpreted as a string, which is why I'm concerned about correctly handling it.])
Also, the name that NtCreateFile receives when asked to open by FILE_ID is of either of the forms [8RawBytesOfFileID] or \??\C:\[8RawBytesOfFileID]. I don't have a means of testing right now, but I would guess that if the Dokan filesystem is mounted on a path, it would be the path to the mountpoint prepended with \??\. (I don't have any idea what it would be in a UNC environment, since the IDs provided by \Device\HardDiskVolume1\[16RawBytesOfObjectID] name syntax are 16-byte GUIDs, and UNC paths are accessed via \Device\MUP\hostname\sharename\.)