exiv2 icon indicating copy to clipboard operation
exiv2 copied to clipboard

Failed to open file with Cyrillic filename [msys2/clang64]

Open eddiezato opened this issue 3 years ago • 18 comments

Describe the bug
$ exiv2 "Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg"
Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg: Failed to open the file
To Reproduce

Build the latest version of exiv2 from main in the msys2/clang64 environment. My build script:

export CC=clang CXX=clang++
git clone --depth 1 https://github.com/Exiv2/exiv2.git
cd exiv2
cmake -B build -G Ninja -S ./ \
    -DBUILD_SHARED_LIBS=OFF \
    -DEXIV2_ENABLE_BMFF=ON \
    -DEXIV2_BUILD_SAMPLES=OFF \
    -DCMAKE_C_FLAGS='-ffunction-sections -fdata-sections -march=native -mtune=native -O3 -pipe' \
    -DCMAKE_CXX_FLAGS='-ffunction-sections -fdata-sections -march=native -mtune=native -O3 -pipe' \
    -DCMAKE_EXE_LINKER_FLAGS='-Wl,--gc-sections -static -liconv -lexpat -lz'
ninja -C build
Expected behavior

The program must be able to open a file with a Cyrillic filename.

$ exiv2 "Darth-Vader-SW-2304085.jpg"
File name       : Darth-Vader-SW-2304085.jpg
File size       : 324653 Bytes
MIME type       : image/jpeg
Image size      : 1920 x 1080
Darth-Vader-SW-2304085.jpg: No Exif data found in the file
Desktop (please complete the following information):
  • OS and version: Windows 11
  • Compiler and version: Clang 14.0.0
  • Compilation mode and/or compiler flags: see in "To Reproduce"
Additional context

eddiezato avatar May 01 '22 03:05 eddiezato

Hi @eddiezato . I just tried to copy & paste the filename you gave as an example (Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg) and I could open the file without problems. I tried with the following terminals on Windows 11:

  • normal CMD
  • bash terminal
  • MSYS UCRT64

I worked in solving this issue few months ago. You can see the work done in this PR: https://github.com/Exiv2/exiv2/pull/2090.

You have mentioned that you are using msys2/clang64 and this makes me think that you probably need to use the UCRT64 MSYS2 environment to handle filepaths with UTF-8 characters properly.

Please let us know if you manage to handle your file correctly after the feedback provided.

piponazo avatar May 02 '22 07:05 piponazo

@piponazo thanks for your reply.

You can see the work done in this PR: https://github.com/Exiv2/exiv2/pull/2090.

Yeah, I've read this and a couple of other topics. I use msys2/clang64 to avoid mixing files built by different compilers since I build everything with clang anyway, and this environment also uses the ucrt library. I also build other apps in the same way, such as flac, libjxl, libwebp, mozjpeg, etc. And they don't have this problem. Only exiv2 and also qimgv.

I've tried open the file without success on Windows 11 inside:

  • cmd
  • pwsh
  • msys2 clang64
  • msys2 ucrt64
  • and all of the above inside wt (windows terminal)

eddiezato avatar May 02 '22 07:05 eddiezato

Build v0.27.5 with EXIV2_ENABLE_WIN_UNICODE=ON gives this:

msys2/clang64 terminal

$ exiv2/build/bin/exiv2.exe Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg
Darth-Vader-SW-╧хЁёюэрцш-╟тхчфэ√х-┬ющэ√-Ї¤эфюь√-2304085.jpg: Failed to open the file

cmd or pwsh

D:\msys2\home\user\exiv2\build\bin>exiv2.exe Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg
File name       : Darth-Vader-SW-╧хЁёюэрцш-╟тхчфэ√х-┬ющэ√-Ї¤эфюь√-2304085.jpg
File size       : 324653 Bytes
MIME type       : image/jpeg
Image size      : 1920 x 1080
Darth-Vader-SW-╧хЁёюэрцш-╟тхчфэ√х-┬ющэ√-Ї¤эфюь√-2304085.jpg: No Exif data found in the file

the file has no exif, but at least it's loaded.

eddiezato avatar May 02 '22 07:05 eddiezato

Alright, let's try to hunt this issue together then 😇 .

Juts as a piece proof that I was not lying before, I upload this screenshot showing how I can open a JPEG file which I renamed in my system with the filename you provided: image

At the moment, I think we are not using clang from windows in CI, and I suspect this might be one possible reason for this problem. Up to now I always compile on windows either using Visual Studio or the Gcc compiler provided with MSYS2 or Cygwin. I can try to generate a clang build and check if I have the problem you are describing. Did you try to use a windows version generated with Visual Studio? I guess you could directly try with the nightly release available here: https://github.com/Exiv2/exiv2/releases/tag/nightly

By the way, in your last comment you mentioned v0.27.5. Please note that #2090 was only merged into the main branch. But I guess you just mentioned that for comparison purposes 😉

piponazo avatar May 02 '22 08:05 piponazo

Did you build main inside MINGW64 environment/terminal (no unicode support) or UCRT64/CLANG64 for sure? Just setting CC=clang from MINGW64 will not work...

I have the CLANG64 environment set up here and will try to reproduce shortly...

kmilos avatar May 02 '22 08:05 kmilos

Previously I was building with clang in the default msys2/mingw64 environment without any problems. But a couple of days ago I just asked myself why use the dependencies created with gcc when I have the clang environment for me. Just for my inner perfectionist. 😜

Did you try to use a windows version generated with Visual Studio?

It works fine:

D:\Downloads\exiv2-1.0.0.9-2019msvc64\bin>exiv2.exe "Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg"
File name       : Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg
File size       : 324653 Bytes
MIME type       : image/jpeg
Image size      : 1920 x 1080
Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg: No Exif data found in the file

Please note that https://github.com/Exiv2/exiv2/pull/2090 was only merged into the main branch. But I guess you just mentioned that for comparison purposes

Yep. 😉

@kmilos

Did you build main inside MINGW64 environment/terminal (no unicode support) or UCRT64/CLANG64 for sure?

Yeah, that's for sure:

2.9G    ./clang64
1.4G    ./home
0       ./ucrt64
0       ./mingw64
0       ./mingw32
0       ./clangarm64
0       ./clang32

eddiezato avatar May 02 '22 08:05 eddiezato

Indeed, it (main 19dc566889c3dd3f694897f19943df49b72c44d5) doesn't work for me either currently:

image

Same w/ UCRT64 build 😕

I haven't tried setting the system locale to UTF-8 yet.

kmilos avatar May 02 '22 09:05 kmilos

I'll try dig into this ASAP, but I am lately pretty busy with work and holidays. I'll probably take weeks before I can do some progress on this topic.

piponazo avatar May 02 '22 09:05 piponazo

Interestingly, UCRT64 does work from the default mintty+bash (i.e. what is used in CI):

image

but doesn't from CLANG64 although it supposedly also links to ucrt...

image

This will take time to figure out, could be missing flags to Clang on our part, could also be due to some MSYS2 CLANG64 libc++ configuration issue...

kmilos avatar May 02 '22 09:05 kmilos

@eddiezato Can you try removing -municode from app/CMakeLists.txt please?

kmilos avatar May 02 '22 15:05 kmilos

removing -municode from app/CMakeLists.txt

cmd

D:\msys2\home\user\exiv2\build\bin>exiv2.exe "Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg"
Exiv2 exception in print action for file Darth-Vader-SW-:
Darth-Vader-SW-

msys2/clang64

user@host CLANG64 ~
$ exiv2/build/bin/exiv2.exe Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg
Darth-Vader-SW-: Failed to open the file

eddiezato avatar May 02 '22 15:05 eddiezato

Thanks. How about -DUNICODE -D_UNICODE instead of -municode?

kmilos avatar May 02 '22 15:05 kmilos

-DUNICODE -D_UNICODE instead of -municode

Same result.

eddiezato avatar May 02 '22 15:05 eddiezato

Thanks for checking, the search continues...

I also tried removing -municode and removing linking in this app/wmain.c hack, still a no-go...

kmilos avatar May 02 '22 16:05 kmilos

I'm not an expert in c++. Just poke around in the code. 😜

So I found this: fs::exists(path) - can't find Unicode path, fs::exists(fs::u8path(path)) - can.

eddiezato avatar May 06 '22 07:05 eddiezato

This might actually be the answer to the issue, so even though you might not be an c++ expert you did a good investigation around! 😁

When I have some spare time, I would like to: 1- First reproduce the issue by my own. 2- Setup a new CI job to reproduce the issue on the cloud (if possible) 3- Apply a fix (possibly using fs::u8path)

piponazo avatar May 06 '22 08:05 piponazo

I made a simple program for testing:

#include <iostream>
#include <filesystem>
#include <string>

namespace fs = std::filesystem;
using namespace std;

template <typename T> string f(T in) {
    if (fs::exists(in))
        return "exists          ";
    else
        return "doesn't exists  ";
}

int main() {
    setlocale(LC_CTYPE, ".utf8");
    
    string ss = "Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg";
    fs::path ps = ss;
    fs::path pu = fs::u8path(ss);
    
    wstring ws = L"Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg";
    fs::path pw = ws;

    cout << "string              " << f(ss) << ss << endl;
    cout << "path from string    " << f(ps) << ps << endl;
    cout << "u8path from string  " << f(pu) << pu << endl << endl;

    cout << "wstring             " << f(ws); wcout << ws; cout << endl;
    cout << "path from wstring   " << f(pw) << pw << endl;
}

Then compiled it in clang64 and ucrt64 environments:

user@host CLANG64 ~/123
$ clang++ -Wall -std=c++17 123.cpp -o 123clang

user@host UCRT64 ~/123
$ clang++ -Wall -std=c++17 123.cpp -o 123ucrt

Output:

D:\msys2\home\user\123>123clang
string              doesn't exists  Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg
path from string    doesn't exists  "Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg"
u8path from string  exists          "Darth-Vader-SW-

wstring             exists          Darth-Vader-SW-
path from wstring   exists          "Darth-Vader-SW-

D:\msys2\home\user\123>123ucrt
string              exists          Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg
path from string    exists          "Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg"
u8path from string  exists          "Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg"

wstring             exists          Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg
path from wstring   exists          "Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg"

eddiezato avatar May 06 '22 09:05 eddiezato

I opened a discussion in the msys2/mingw repo. The problem seems to be in libc++ itself, so I dunno if exiv2 should to have any adjustments specifically for the msys2/clang64 environment.

eddiezato avatar May 09 '22 02:05 eddiezato

seems this was fixed upstream. No release yet though. I'll close here since the bug is in libc++.

neheb avatar Jul 20 '23 18:07 neheb