openFrameworks icon indicating copy to clipboard operation
openFrameworks copied to clipboard

fs::path <> string interoperability

Open dimitre opened this issue 2 years ago • 13 comments

this PR is to experiment interoperability between fs::path and string. I think this was not happening before because of some systems were using std::experimental::filesystem::v1::__cxx11::path which caused errors when converting fs::path to string. I've updated ofConstants.h in this PR so __cxx11 is not being used anymore

dimitre avatar Dec 12 '23 02:12 dimitre

it will cause some test errors with stuff like ofToDataPath(path).back() to get the latest character of a string, and it doesnt' work with fs::path I personally think the gains outweight the potential differences. the idea here is change return type to fs::path and remove the FS() versions thoughts? @artificiel @ofTheo

dimitre avatar Dec 12 '23 02:12 dimitre

+1 for path return type everywhere.

for the failing tests they should be updated to explicitly perform the string conversion:

ofxTestEq(ofPathToString(ofToDataPath("movies\\",true)).back(), '\\', "absolute ofToDataPath with \\ should end in \\");

(incidentally it's more or less what someone holding to some legacy code using string methods directly on the return of ofDataPath might need to do, but method chaining is not prevalent in user code)

artificiel avatar Dec 12 '23 03:12 artificiel

Great. if we decide to proceed with this idea I'll complete the work in this same PR.

dimitre avatar Dec 12 '23 12:12 dimitre

I've finally finished this one. I've updated the tests accordingly. This PR changes return type from a number of core functions from string to fs::path but brings end to end fs::path to the core.

opinions? @ofTheo @NickHardeman @artificiel @danoli3

dimitre avatar Jan 06 '24 21:01 dimitre

Wow. Awesome! Did you run into anything with the tests that might indicate breaking changes for old projects?

ofTheo avatar Jan 06 '24 22:01 ofTheo

In the tests the things I had to change is something like ofToDataPath("").back() and my own usage the only thing I had to change is one addon using this line

	string resultado = ofSystem("open " + ofToDataPath(fullFileName));

I had to switch to something like:

	string resultado = ofSystem("open " + ofPathToString(ofToDataPath(fullFileName)));
or 
	string resultado = ofSystem("open " + ofToDataPath(fullFileName).string());

dimitre avatar Jan 06 '24 23:01 dimitre

ahh so does that mean any code that has string + fs::path ( which used to be ) string + string, now needs something like: string + fs::path.string().

I guess that is the downside - but the upside is support for non English filesystems right?

ofTheo avatar Jan 07 '24 00:01 ofTheo

Yes exactly. support for different encodings end to end inside core. I think two things are important for using OF for teaching: being able to parse non english characters on path, and being able to work with spaces in paths.

dimitre avatar Jan 07 '24 02:01 dimitre

I think this one is ready to go fs::path in more places, using implicit conversion to std::basic_string (char & wchar) where possible. and explicit string conversion where not possible. cc: @ofTheo

dimitre avatar May 13 '24 14:05 dimitre

Hi @dimitre, This is great, thank you. Would this PR fix the issue mentioned here: https://github.com/openframeworks/openFrameworks/issues/7913

NickHardeman avatar May 13 '24 16:05 NickHardeman

What do you think @ofTheo ? the only downside is this: https://github.com/openframeworks/openFrameworks/pull/7817#issuecomment-1879876692

dimitre avatar May 13 '24 16:05 dimitre

and just to collect what was discussed elsewhere in the context here, given an fullFileName that contains non-POSIX unicode:

	string resultado = ofSystem("open " + ofPathToString(ofToDataPath(fullFileName)));
        // will not crash but will do something unpredictable (as the string returned will be "")
	string resultado = ofSystem("open " + ofToDataPath(fullFileName).string());
        // will throw/crash

I know @dimitre wants C++14 but the proper way of handling such a disappointment is solved with <optional>, where the code would look like:

        if (auto conformant = ofConformPathToNarrowString(ofToDataPath(fullFileName))) {
            auto resultado = ofSystem("open " + conformant.value());
            // use resultado with confidence
        } else {
            ofLogWarning("Your path contains incompatible wide unicode characters)
	}

a similar behaviour is obtainable with checking for an empty string but it's exactly the pattern that optional wants to eradicate (magic numbers etc).

        auto conformant = ofConformPathToNarrowString(ofToDataPath(fullFileName));
        if (!conform.empty()) {
            auto resultado = ofSystem("open " + conformant);
            // use resultado with confidence
        } else {
            ofLogWarning("Your path is empty or contains incompatible wide unicode characters)
	}

artificiel avatar May 13 '24 17:05 artificiel

@ofTheo please check this one when you have time. I have other ideas to test that build upon this one merged

dimitre avatar May 17 '24 18:05 dimitre

I think the api break is really minor compared to the positive side of having fs::path end to end. @ofTheo what do you think?

dimitre avatar Jul 16 '24 15:07 dimitre

for devs: paths can be made into strings with .string() on any path (or c_str()). so updating any code using paths where using string functions like string.find() can be fixed easily with path.string().find(

danoli3 avatar Jul 16 '24 16:07 danoli3

the potential problem with path.string() is that if the native path is wide (windows), and contains untranslatable unicode, .string() may throw (std::filesystem contains a lot of implementation-defined stuff around string).

u8string() removes ambiguity, but it's C++20. I would not mind seeing OF go up to C++20. (NB I've been compiling OF as C++20 for at least 1 year).

staying C++17, and because we want to minimize the use of try-catch, my suggestion above is to embrace the <optional> pattern and wrap the try-catch in an ofConformPathToNarrowString with nullopt if a problem occurred. if <optional> is deemed to fresh, a similar function but returning a string, which can be checked to validate some "real data" as been returned, old-school empty string implying an exception was caught (but I would really like to see <optional> usage in place where such "magic values" are used to pass special cases).

in all case the idea is that the try-catch is not under the responsibility of the dev- or end-user (i.e. the core itself will not throw unexpectedly because of an untranslatable filesystem path encountered on a (presumably asian) windows filesystem,).

artificiel avatar Jul 16 '24 17:07 artificiel

C++ 20 + for next version of oF I think is do-able

  • all osx / macOS / iOS side is already at C++23
  • MSVC VS is at C++23
  • testing gcc14 linux at the moment in test runners for compatibility

danoli3 avatar Jul 16 '24 18:07 danoli3

OK so for C++17 it's a question of letting usage of .string() un-tryed-catched and risk crashes on windows with asian file paths, or wrapping .string() in a function that catches the exception either with an optional or an empty string.

artificiel avatar Jul 16 '24 18:07 artificiel

I’ll add some utf8 narrowing tests with simplified Chinese and other utf8 charsets.

Related: https://en.cppreference.com/w/cpp/filesystem/path/u8path

On Wed, 17 Jul 2024 at 4:26 AM, alexandre burton @.***> wrote:

OK so for C++17 it's a question of letting usage of .string() un-tryed-catched and risk crashes on windows with asian file paths, or wrapping .string() in a function that catches the exception either with an optional or an empty string.

— Reply to this email directly, view it on GitHub https://github.com/openframeworks/openFrameworks/pull/7817#issuecomment-2231543238, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGK2HDJEIMGUAMLTTKBEY3ZMVQVZAVCNFSM6AAAAABAQXM4QCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZRGU2DGMRTHA . You are receiving this because you were mentioned.Message ID: @.***>

danoli3 avatar Jul 16 '24 19:07 danoli3

@dimitre - Thanks for all the work on this!!
If I understand right the upside of this, is much better FS support for unicode file paths etc? ie:: with this PR can you open files paths that have asian or accented characters?

Downside obviously is any addon or project that is expecting a string return but gets a path return. Like the ofSystem example above.

It's really hard to gauge how big an impact this would have - I guess I could try it with a large project and see how many errors we have to fix as a general sense. ( curious if it will be as big as the ofVec3f -> glm::vec3 transition )

@ all Is there anyway to support both string returns and fs ( I am guessing not )

ofTheo avatar Jul 16 '24 20:07 ofTheo

@ofTheo throwing projects and addons at the changes is perhaps the best way to quantify actual disruption, but to be clear there are 2 distinct considerations:

Downside obviously is any addon or project that is expecting a string return but gets a path return.

more precisely: something that is implicitly expecting a string, and chains a method (not part of path's interface) onto it. so this will fail (@dimitre's original example above):

ofLogNotice("last") << usedToReturnAStringButNowPath().back();

but this will work: (path will self-convert to string)

std::string tmp_str = usedToReturnAStringButNowPath();
ofLogNotice("last") << tmp_str.back();

as well as this: (implicit conversion)

takesAString(usedToReturnAStringButNowPath());

(note that this behaviour is what makes it possible to transparently toggle between #ifdef WIN32 to call wchar-flavoured versions of library filesystem-related apis)

it is difficult to create test cases around unicode narrowing as locale is involved, and it is implementation-specific behaviour.

This will compile, but runtime error on macOS:

std::wstring w = L"C:\\example\\path\\";
w += L"\xD83D\xDE00";  // funny stuff
w += L"\\file.txt";
std::filesystem::path p { w }; // << runtime error here

note that we don't get to test the .string() method itself, as we can't reach a state where we have a legal path in hand. My guess is that we need to coerce a failing test on windows itself (which I can't help with).

but as-is, the proposed changes will work, until a "strange" character is runtime-encountered.

artificiel avatar Jul 17 '24 01:07 artificiel

Thanks for the explanation @artificiel In terms of the macOS path stuff will a path that The Finder can display like a path with an accented e or something also runtime error? Or just more obscure wide characters?

Going to pull these changes into a couple of projects today and see if any errors show up.

Thanks all!

ofTheo avatar Jul 17 '24 17:07 ofTheo

@dimitre I tested this on a large project with a fair number of addons on macOS and it did not cause any compiler or linking errors. I know that's not a perfect indicator but it's a good sign.

So I think good to merge from my perspective :)

@artificiel @danoli3 what do you think?

ofTheo avatar Jul 17 '24 20:07 ofTheo

@ofTheo only obscure wide characters, which is specifically windows-specific (or maybe a config file exported in the incompatible character set). So the exception might rise for a windows user with such an obscure character in a path, sending it into an addon that implicitly converts to an std::string. That being said, encountering that case would help as we’d have an actual real world situation to test against…

artificiel avatar Jul 18 '24 03:07 artificiel

Yeah it seems like a windows issue only, macOS no issues with c++23, only windows starting to see issues with Wchar / wstring. I think maybe a Microsoft developer may have overthought something that was as simple as cast all strings as wide.

On Thu, 18 Jul 2024 at 1:46 PM, alexandre burton @.***> wrote:

@ofTheo https://github.com/ofTheo only obscure wide characters, which is specifically windows-specific (or maybe a config file exported in the incompatible character set). So the exception might rise for a windows user with such an obscure character in a path, sending it into an addon that implicitly converts to an std::string. That being said, encountering that case would help as we’d have an actual real world situation to test against…

— Reply to this email directly, view it on GitHub https://github.com/openframeworks/openFrameworks/pull/7817#issuecomment-2235261473, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGK2HGH27P2L2BYOEOKOLDZM43A7AVCNFSM6AAAAABAQXM4QCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZVGI3DCNBXGM . You are receiving this because you were mentioned.Message ID: @.***>

danoli3 avatar Jul 18 '24 06:07 danoli3

Thanks all and thanks @dimitre for doing the legwork! A meaningful step forward for the project! :)

ofTheo avatar Jul 18 '24 17:07 ofTheo