STL
STL copied to clipboard
Native, non <cstdio> based implementation of <fstream>
It's my understanding that <fstream> is still a wrapper around a standard C FILE. Why doesn't the Microsoft STL implementation move forward and offer an efficient implementation that uses the Win32 API directly? Required functions can be forward declared so it's not necessary to include <Windows.h>.
Direct API usage is not possible due to character set conversion and line endings handling.
Also direct use of file API would be inefficient, because typical stream usage needs buffering. I recall a severe perf regression in my own codebase after replacing FILE* with direct API usage.
Having taken into account both, we would basically obtain another implementation of C FILE*. Sharing the implementation simplifies maintenance.
Personally I use OS API directly in perf-critical code (with own buffering, or by using memory mapping) to avoid FILE* or fstream problems, but I see it as a failure of the standard, not (only) the implementation.
Direct API usage is not possible due to character set conversion and line endings handling
I'm not 100% I understand what you mean but at least for paths I don't see any problem with character sets: in Windows platform programming it's highly conventional and applies to both <cstdio> and <fstream> that const char* paths use ANSI charsets while const wchar_t* uses Unicode charset. In fact basic_fstream has non standard constructors/methods that accept wchart_t strings. I am also not sure what's the problem with line endings but for me it's hard to believe that there's anything that can't be just handled with programming. One possible source of troubles would be doing it header only but I don't see anything "impossible" to do.
Also direct use of file API would be inefficient, because typical stream usage needs buffering
The buffering can just be done without C API, that's the point of std::basic_streambuf, which std::basic_filebuf inherits.
Yeah, the whole point of fstreams (or FILE*/stdio.h) is to get userspace buffering. It's pretty common to make iostreams a wrapper around cstdio (in fact it used to be libstdcpp implemented them as the same object. Like the cstdio FILE* had the same layout as the vtable for a streambuf). We can't do anything like this as far as I can tell because the ucrt stdio doesn't do everything through indirect calls (which is a bit unfortunate since it means we can't do fmemopen / fopencookie).
This whole scheme did end up being migrated away from I think, for whatever reason (I can think of a few possibilities, for sure).
I don't think we could ever do this, even in vNext. We've supported construction from FILE* as an extension since the beginning of time (and in my opinion this is one of our less evil extensions):
https://github.com/microsoft/STL/blob/ad80eb79ba4953e7529d15b1bc8d0b540150bf7d/stl/inc/fstream#L174
Also basic_fstream has a similar extension:
https://github.com/microsoft/STL/blob/ad80eb79ba4953e7529d15b1bc8d0b540150bf7d/stl/inc/fstream#L1227
That signature would be safe to preserve as the internal buffer of a fstream can be polymorphic. I believe dropping the basic_filebuf constructor may not hurt that many people. I insist: why don't you see the value of getting rid of dependency on C API? C++ standard never enforced basic_fstream to be a wrapper around FILE and I consider it kind of a naive solution. In a sense the C++ standard at some point might enforce the API to be less bloated by enforcing streams to expose their native handles[1], similar to mutex/thread classes, and getting rid of FILE wrapping would be step in that direction. It's a pity the proposal was (temporarily?) dropped because the original author became irresponsive[2].
[1] https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1759r3.pdf [2] https://github.com/cplusplus/papers/issues/516
It seemly introduces additional indirections to use FILE*. In UCRT, it is needed to dereference a FILE* pointer twice to access the internal __crt_stdio_stream_data object, whose functionality, IIUC, is roughly equivalent to filebuf.
I'm curious about whether it's possible to "inline" that structure into filebuf in vNext.
It seemly introduces additional indirections to use
FILE*[...] I'm curious about whether it's possible to "inline" that structure into filebuf
I don't know the internal structure of FILE* may that may be an indirection of native file handles as well, and accessing those through C API may require using methods that can't be inlined. Another solution to preserve the current FILE* signatures in basic_fstream and basic_filebuf constructors could be to flush the passed stream upon calling, extract the native handles from FILE* and operate on those, doing proper buffering on basic_filebuf. It's probably a breaking change for some users but expecting that passing a FILE* to a basic_fstream/basic_filebuf would allow to operate seamlessly on both loos like a very bad assumption anyway.
Note that the standard does define the behavior of fstreams and filebuf to be "as-if" calling respective C FILE* APIs. So the streams would end up needing to reimplement parts of the C FILE* API to get the same behavior.
I insist: why don't you see the value of getting rid of dependency on C API?
What specific value are you expecting? If its perf, is there an implementation, one could benchmark against.
What specific value are you expecting?
The value is demystifying the implementation of basic_fstream/basic_filebuf. There's a mentality change in C++ world going on and more and more C legacy APIs being practically obsoleted by new C++ full replacements (eg. <chrono>, <format>): if one had to create a <fstream> implementation from scratch today wrapping <cstdio> would certainly looks like a suboptimal solution. As an example Microsoft implemented the FileStream class in .NET Framework by wrapping native OS handles (it's easier to look in the older .NET framework code[1]). The same can certainly be done in <fstream>. As said, it's also a step towards exposing streams native handles in a clean way.
Performance it's not a concern for me but removing a level of wrapping in I/O operations looks performance sensible to me.
[1] https://github.com/microsoft/referencesource/blob/5697c29004a34d80acdaf5742d7e699022c64ecd/mscorlib/system/io/filestream.cs#L845
The value is demystifying the implementation of basic_fstream/basic_filebuf.
Maybe I'm dense, but I don't get why this is a value. The implementation isn't a mystery to me - I just don't care. Why do I (as a user) care about how basic_fstream is implemented, if it doesn't affect perf (which needs to be verified) and it doesn't affect the interface (which is determined by the standard)?
<chrono> and <format> are a completely different case, because here, a C-API was replaced by a (arguably) more convenient c++ one. Here you are talking about changing the implementation, but leaving the API (which is already a c++-API) as it is.
It's not that I'm against a different implementation, but you are asking the maintainers to spend a significant amount of work to reimplement fstream without (I think) giving them a clear idea of the (potential) benefits for them or their users.
Btw.: If someone (e.g. the implementation of a future fstream::native_handle()) wants to get a native file handle from std::FILE*, I think one can just use _fileno + _get_osfhandle (https://docs.microsoft.com/de-de/cpp/c-runtime-library/reference/fileno?view=msvc-170). No need to use the win32 API instead of the c-api for openeing/closing files.
Since the definition of native handle is left up in the air, we could simply treat FILE* as our native handle and return that. Users who want to get the "true" native handle can then call the APIs mentioned above.
std::mutex is similar, it does not return a handle to a kernel mutex object, nor does it return a handle to a SRW lock. It returns a pointer to an internal ConcRT structure.
Since the definition of native handle is left up in the air, we could simply treat FILE* as our native handle and return that. Users who want to get the "true" native handle can then call the APIs mentioned above.
[EDIT: a first I misread your post a little bit] That would work, but getting a FILE* to then calling 2 other functions to finally access the handle seems weird to me.
P1759 got approved, so Microsoft STL team will soon have to decide what basic_filebuf::native_handle() should actually return. This issue wasn't answered as I hoped, but it wasn't closed as well: did you talk about implementing <fstream> on top of internal Windows syscalls or should I expect it to return FILE*? My opinion: if one had to write a STL implementation from scratch, then would probably implement it around the native API of the system. Today <fstream> on top of <cstdio> looks like a wrapper bloat.
P1759 got approved, so Microsoft STL team will soon have to decide what
basic_filebuf::native_handle()should actually return.
I think MSVC STL will do the same thing as indicated by that paper, except that the typedef name HANDLE won't be exposed and only _Ugly names will be used as implementation details.
Code from the paper
For MSVC:
template <class CharT, class Traits>
class basic_filebuf : public basic_streambuf<CharT, Traits> {
// ...
using native_handle_type = HANDLE;
// ...
native_handle_type native_handle() {
assert(is_open());
// _Myfile is a FILE*
auto cfile = ::_fileno(_Myfile);
// _get_osfhandle returns intptr_t, which can be cast to HANDLE (void*)
return static_cast<HANDLE>(::_get_osfhandle(cfile));
}
// ...
}
Suitably modified version
// ...
#if _HAS_CXX26
using native_handle_type = void*; // HANDLE declared in winnt.h
#endif // _HAS_CXX26
// ...
#if _HAS_CXX26
native_handle_type native_handle() const noexcept {
_STL_ASSERT(is_open());
return reinterpret_cast<native_handle_type>(_CSTD _get_osfhandle(_CSTD _fileno(_Myfile)));
}
#endif // _HAS_CXX26
// ...
return reinterpret_cast<native_handle_type>(_CSTD _get_osfhandle(_CSTD _fileno(_Myfile)));
Returning HANDLE, but still wrapping FILE*, is of course another viable solution to implement P1759. Somehow I'm mixing topics (which is my fault, sorry for that): 1. implementing the requirements for P1759 and 2. considering an alternative implementation of <fstream> that doesn't use <cstdio>. If the second topic is out of discussion then this issue can be safely closed, since I am sure I will be able to retrieve HANDLE when native_handle() is implemented, in a way or another.