Failed to open file with Cyrillic filename [msys2/clang64]
Describe the bug
$ exiv2 "Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg"
Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg: Failed to open the file
To Reproduce
Build the latest version of exiv2 from main in the msys2/clang64 environment. My build script:
export CC=clang CXX=clang++
git clone --depth 1 https://github.com/Exiv2/exiv2.git
cd exiv2
cmake -B build -G Ninja -S ./ \
-DBUILD_SHARED_LIBS=OFF \
-DEXIV2_ENABLE_BMFF=ON \
-DEXIV2_BUILD_SAMPLES=OFF \
-DCMAKE_C_FLAGS='-ffunction-sections -fdata-sections -march=native -mtune=native -O3 -pipe' \
-DCMAKE_CXX_FLAGS='-ffunction-sections -fdata-sections -march=native -mtune=native -O3 -pipe' \
-DCMAKE_EXE_LINKER_FLAGS='-Wl,--gc-sections -static -liconv -lexpat -lz'
ninja -C build
Expected behavior
The program must be able to open a file with a Cyrillic filename.
$ exiv2 "Darth-Vader-SW-2304085.jpg"
File name : Darth-Vader-SW-2304085.jpg
File size : 324653 Bytes
MIME type : image/jpeg
Image size : 1920 x 1080
Darth-Vader-SW-2304085.jpg: No Exif data found in the file
Desktop (please complete the following information):
- OS and version: Windows 11
- Compiler and version: Clang 14.0.0
- Compilation mode and/or compiler flags: see in "To Reproduce"
Additional context
Hi @eddiezato . I just tried to copy & paste the filename you gave as an example (Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg) and I could open the file without problems. I tried with the following terminals on Windows 11:
- normal CMD
- bash terminal
- MSYS UCRT64
I worked in solving this issue few months ago. You can see the work done in this PR: https://github.com/Exiv2/exiv2/pull/2090.
You have mentioned that you are using msys2/clang64 and this makes me think that you probably need to use the UCRT64 MSYS2 environment to handle filepaths with UTF-8 characters properly.
Please let us know if you manage to handle your file correctly after the feedback provided.
@piponazo thanks for your reply.
You can see the work done in this PR: https://github.com/Exiv2/exiv2/pull/2090.
Yeah, I've read this and a couple of other topics. I use msys2/clang64 to avoid mixing files built by different compilers since I build everything with clang anyway, and this environment also uses the ucrt library. I also build other apps in the same way, such as flac, libjxl, libwebp, mozjpeg, etc. And they don't have this problem. Only exiv2 and also qimgv.
I've tried open the file without success on Windows 11 inside:
cmdpwshmsys2 clang64msys2 ucrt64- and all of the above inside
wt(windows terminal)
Build v0.27.5 with EXIV2_ENABLE_WIN_UNICODE=ON gives this:
msys2/clang64 terminal
$ exiv2/build/bin/exiv2.exe Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg
Darth-Vader-SW-╧хЁёюэрцш-╟тхчфэ√х-┬ющэ√-Ї¤эфюь√-2304085.jpg: Failed to open the file
cmd or pwsh
D:\msys2\home\user\exiv2\build\bin>exiv2.exe Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg
File name : Darth-Vader-SW-╧хЁёюэрцш-╟тхчфэ√х-┬ющэ√-Ї¤эфюь√-2304085.jpg
File size : 324653 Bytes
MIME type : image/jpeg
Image size : 1920 x 1080
Darth-Vader-SW-╧хЁёюэрцш-╟тхчфэ√х-┬ющэ√-Ї¤эфюь√-2304085.jpg: No Exif data found in the file
the file has no exif, but at least it's loaded.
Alright, let's try to hunt this issue together then 😇 .
Juts as a piece proof that I was not lying before, I upload this screenshot showing how I can open a JPEG file which I renamed in my system with the filename you provided:

At the moment, I think we are not using clang from windows in CI, and I suspect this might be one possible reason for this problem. Up to now I always compile on windows either using Visual Studio or the Gcc compiler provided with MSYS2 or Cygwin. I can try to generate a clang build and check if I have the problem you are describing. Did you try to use a windows version generated with Visual Studio? I guess you could directly try with the nightly release available here:
https://github.com/Exiv2/exiv2/releases/tag/nightly
By the way, in your last comment you mentioned v0.27.5. Please note that #2090 was only merged into the main branch. But I guess you just mentioned that for comparison purposes 😉
Did you build main inside MINGW64 environment/terminal (no unicode support) or UCRT64/CLANG64 for sure? Just setting CC=clang from MINGW64 will not work...
I have the CLANG64 environment set up here and will try to reproduce shortly...
Previously I was building with clang in the default msys2/mingw64 environment without any problems. But a couple of days ago I just asked myself why use the dependencies created with gcc when I have the clang environment for me. Just for my inner perfectionist. 😜
Did you try to use a windows version generated with Visual Studio?
It works fine:
D:\Downloads\exiv2-1.0.0.9-2019msvc64\bin>exiv2.exe "Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg"
File name : Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg
File size : 324653 Bytes
MIME type : image/jpeg
Image size : 1920 x 1080
Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg: No Exif data found in the file
Please note that https://github.com/Exiv2/exiv2/pull/2090 was only merged into the main branch. But I guess you just mentioned that for comparison purposes
Yep. 😉
@kmilos
Did you build main inside MINGW64 environment/terminal (no unicode support) or UCRT64/CLANG64 for sure?
Yeah, that's for sure:
2.9G ./clang64
1.4G ./home
0 ./ucrt64
0 ./mingw64
0 ./mingw32
0 ./clangarm64
0 ./clang32
Indeed, it (main 19dc566889c3dd3f694897f19943df49b72c44d5) doesn't work for me either currently:

Same w/ UCRT64 build 😕
I haven't tried setting the system locale to UTF-8 yet.
I'll try dig into this ASAP, but I am lately pretty busy with work and holidays. I'll probably take weeks before I can do some progress on this topic.
Interestingly, UCRT64 does work from the default mintty+bash (i.e. what is used in CI):

but doesn't from CLANG64 although it supposedly also links to ucrt...

This will take time to figure out, could be missing flags to Clang on our part, could also be due to some MSYS2 CLANG64 libc++ configuration issue...
@eddiezato Can you try removing -municode from app/CMakeLists.txt please?
removing -municode from
app/CMakeLists.txt
cmd
D:\msys2\home\user\exiv2\build\bin>exiv2.exe "Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg"
Exiv2 exception in print action for file Darth-Vader-SW-:
Darth-Vader-SW-
msys2/clang64
user@host CLANG64 ~
$ exiv2/build/bin/exiv2.exe Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg
Darth-Vader-SW-: Failed to open the file
Thanks. How about -DUNICODE -D_UNICODE instead of -municode?
-DUNICODE -D_UNICODEinstead of-municode
Same result.
Thanks for checking, the search continues...
I also tried removing -municode and removing linking in this app/wmain.c hack, still a no-go...
I'm not an expert in c++. Just poke around in the code. 😜
So I found this:
fs::exists(path) - can't find Unicode path,
fs::exists(fs::u8path(path)) - can.
This might actually be the answer to the issue, so even though you might not be an c++ expert you did a good investigation around! 😁
When I have some spare time, I would like to: 1- First reproduce the issue by my own. 2- Setup a new CI job to reproduce the issue on the cloud (if possible) 3- Apply a fix (possibly using fs::u8path)
I made a simple program for testing:
#include <iostream>
#include <filesystem>
#include <string>
namespace fs = std::filesystem;
using namespace std;
template <typename T> string f(T in) {
if (fs::exists(in))
return "exists ";
else
return "doesn't exists ";
}
int main() {
setlocale(LC_CTYPE, ".utf8");
string ss = "Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg";
fs::path ps = ss;
fs::path pu = fs::u8path(ss);
wstring ws = L"Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg";
fs::path pw = ws;
cout << "string " << f(ss) << ss << endl;
cout << "path from string " << f(ps) << ps << endl;
cout << "u8path from string " << f(pu) << pu << endl << endl;
cout << "wstring " << f(ws); wcout << ws; cout << endl;
cout << "path from wstring " << f(pw) << pw << endl;
}
Then compiled it in clang64 and ucrt64 environments:
user@host CLANG64 ~/123
$ clang++ -Wall -std=c++17 123.cpp -o 123clang
user@host UCRT64 ~/123
$ clang++ -Wall -std=c++17 123.cpp -o 123ucrt
Output:
D:\msys2\home\user\123>123clang
string doesn't exists Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg
path from string doesn't exists "Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg"
u8path from string exists "Darth-Vader-SW-
wstring exists Darth-Vader-SW-
path from wstring exists "Darth-Vader-SW-
D:\msys2\home\user\123>123ucrt
string exists Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg
path from string exists "Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg"
u8path from string exists "Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg"
wstring exists Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg
path from wstring exists "Darth-Vader-SW-Персонажи-Звездные-Войны-фэндомы-2304085.jpg"
I opened a discussion in the msys2/mingw repo. The problem seems to be in libc++ itself, so I dunno if exiv2 should to have any adjustments specifically for the msys2/clang64 environment.
seems this was fixed upstream. No release yet though. I'll close here since the bug is in libc++.