xxHash icon indicating copy to clipboard operation
xxHash copied to clipboard

Draft: Support Win32 long path

Open t-mat opened this issue 6 months ago • 10 comments

This PR fixes #1060.

  • Add new function XSUM_widenStringAsUncPath() in cli/xsum_os_specific.c.
    • XSUM_fopen() and XSUM_stat() use this new function.
    • It always returns absolute UNC path with \\?\ prefix.
      • UTF-8 path is converted to wchar_t path.
      • Relative path is converted to absolute path with \\?\ prefix.
    • \\?\ prefix automatically avoids classic DOS device quirks (e.g. COM1, NUL, etc)
      • I think the \\?\ prefix is supported from Windows 2000. But I can't find any source from Microsoft :(
  • Add new test scripts in tests/windows/
    • 00-test-all.bat invokes all tests.
    • tests/windows/build-with-cmake.bat builds xxhsum with cmake
    • tests/windows/test-long-path.bat tests issue #1060.
    • !__! tracks line number. :ERROR routine uses it to indicate error with line number.

t-mat avatar Aug 21 '25 14:08 t-mat

For some reason, GH actions for MinGW fails at the following line of the Makefile

https://github.com/Cyan4973/xxHash/blob/fb93aa4c414ed40de580c7140e9dad16a56749b4/Makefile#L207

Error log

OK. (passes 49948 tests)
5ffb01494ce73724  stdin
Error: unable to open input
Error: Could not open 'xxhash.c': No such file or directory. 
Error: unable to open input
Error: Could not open 'xxhash.h': No such file or directory. 
mingw32-make: *** [Makefile:207: check] Error 1
Error: Process completed with exit code 2.

t-mat avatar Aug 21 '25 15:08 t-mat

I can't reproduce above error in my local msys environment. But here is a different issue from my local environment:

# CFLAGS=-Werror make -C tests/collisions check

Testing xxh3 algorithm (64-bit)
This program will allocate a lot of memory,
generate 3200000 64-bit hashes from samples of 256 bytes,
and attempt to produce 0 collisions.

 Creating filter (0 GB)
 Generate 3200000 hashes from samples of 256 bytes
 Generation and filter completed in 0s , detected up to 240270 candidates
 Storing hash candidates (1 MB)
assertion "checkPtr == hashCandidates" failed: file "main.c", line 772, function: search_collisions

Is this line fine?

https://github.com/Cyan4973/xxHash/blob/7aee8d0a341bb574f7c139c769e1db115b42cc3c/tests/collisions/main.c#L772

Allocation size of hashCandidates is tableSize.

size_t const tableSize = (size_t)((maxNbH+1) * hashByteSize);
...
void* const hashCandidates = malloc(tableSize);

But size for realloc() is nbCandidates * hashByteSize.

if (nbCandidates < maxNbH) {
  ...
  void* const checkPtr = realloc(hashCandidates, nbCandidates * hashByteSize);
  ...
  assert(checkPtr == hashCandidates);  /* simplification: since we are reducing the size, ... */
}

Since nbCandidates < maxNbH, argument of relloc() is always less than its original allocation size. It means realloc() may or may not reallocate (move) its memory region (or not). Therefore, after this realloc(), we should not use hashCandidates.

Also, when I put printf("%p\n", hashCandidates) for debug, gcc-15 reports the following warning/error.

error: pointer 'hashCandidates' may be used after 'realloc' [-Werror=use-after-free]

t-mat avatar Aug 21 '25 17:08 t-mat

This is great but also I think that Windows requires that there specifically be a manifest in the exe file. Looking into it, I think that we can simply make a file called xxhsum.manifest alongside the xxhsum.c and then cmake will automatically pick that up and link it in: https://cmake.org/cmake/help/v3.4/release/3.4.html#other

Suggested xxhsum.manifest:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0" xmlns:asmv3="urn:schemas-microsoft-com:asm.v3">
  <asmv3:application>
    <windowsSettings xmlns:ws2="http://schemas.microsoft.com/SMI/2016/WindowsSettings">
        <ws2:longPathAware>true</ws2:longPathAware>
    </windowsSettings>
  </asmv3:application>
  <trustInfo xmlns="urn:schemas-microsoft-com:asm.v3">
    <security>
      <requestedPrivileges>
        <requestedExecutionLevel level="asInvoker"/>
      </requestedPrivileges>
    </security>
  </trustInfo>
  <compatibility xmlns="urn:schemas-microsoft-com:compatibility.v1">
    <application>
      <!--The ID below indicates application support for Windows Vista -->
      <supportedOS Id="{e2011457-1546-43c5-a5fe-008deee3d3f0}"/>
      <!--The ID below indicates application support for Windows 7 -->
      <supportedOS Id="{35138b9a-5d96-4fbd-8e2d-a2440225f93a}"/>
      <!--The ID below indicates application support for Windows 8 -->
      <supportedOS Id="{4a2f28e3-53b9-4441-ba9c-d69d4a4a6e38}"/>
      <!--The ID below indicates application support for Windows 8.1 -->
      <supportedOS Id="{1f676c76-80e1-4239-95bb-83d0f6d0da78}"/> 
      <!--The ID below indicates application support for Windows 10 -->
      <supportedOS Id="{8e0f7a12-bfb3-4fe8-b9a5-48fd50a15a9a}"/> 
    </application>
  </compatibility>
</assembly>

29039 avatar Aug 22 '25 00:08 29039

@29039 While a manifest file is convenient, my goal is to provide broader compatibility. For example, I’d like to support “bad” Win32 filenames such as my_file. (filename with a trailing period). Since Linux/WSL (and NTFS) accepts this as a valid filename, broader compatibility could benefit other users as well. (My current implementation is incorrect, so it doesn’t work yet.)

Please also note that this patch works correctly even without a manifest file.

t-mat avatar Aug 22 '25 03:08 t-mat

WIP. But now this PR supports long-path, UNC path and trailing period. e.g. my_trailing_period_file. tests/windows/00-test-all.bat builds and tests all.

TODO:

  • [ ] Ask Cyan: We can allow to use pathcch.lib to simplifiy the code. Though it's only supported Windows8+ or Windows Server 2012+.
  • [X] Fix CMakefile.txt for mingw which doesn't support #pragma comment(lib)

t-mat avatar Aug 22 '25 15:08 t-mat

Added cmake option XXHASH_WIN_TARGET_WIN10 for MinGW / cross-compile environment.

  • cmake + MSVC user doesn't have to care about this option.
  • MinGW user who need long-path support, they should pass XXHASH_WIN_TARGET_WIN10 to cmake.

When cmake uses MSVC, it automatically host environment is Windows 10+ or not. But in the MinGW/MSYS environment, I don't know how to detect it. Therefore, I added expcilit option for it.

This switch enables long-path code via preprocessor macro XXHSUM_WIN32_LONGPATH.

t-mat avatar Aug 23 '25 01:08 t-mat

PATHCCH_DO_NOT_NORMALIZE_SEGMENTS (support trailing period and space) and PATHCCH_ENSURE_IS_EXTENDED_LENGTH_PATH (ensure \\?\ prefix) are implemented since Windows 10 RS2 1703 as operation system DLL and its SDK. Unfortunately, earlier version of Windows just ignore these flags.

Therefore, its result depends on Win10 version. (xxhsum my_file. works fine with Win10RS2>, but earlier versions of Win10 reports error "file not found")

So we need to check runtime Windows version for it. But it may be over-engineering for this small utility. Possible solutions are:

  1. Re-imprement them by hand. Actually, reactos have them. It also provides wider compatibility as a bonus. (It works with all NTFS Windows).
  2. Detect runtime Windows version and use proper flag and logic.
  3. Fallback to manifest as @29039 suggested.
  4. Just implement it and say "It's Microsoft spec, not our fault 😏 ".

t-mat avatar Aug 23 '25 02:08 t-mat

Sorry, just noticed this question:

Ask Cyan: We can allow to use pathcch.lib to simplifiy the code. Though it's only supported Windows8+ or Windows Server 2012+.

I think that's reasonable

Cyan4973 avatar Sep 03 '25 01:09 Cyan4973

@Cyan4973 , thanks. I forgot to add @ to the question 😅

t-mat avatar Sep 08 '25 01:09 t-mat

This PR is still tagged as "Draft". But at a quick glance, it already looks pretty clean.

So at which stage is it ? Still some work to do ? Or ready for review and merge ?

cc @t-mat

Cyan4973 avatar Sep 11 '25 21:09 Cyan4973