`tellg` returns -1 when using clang++ to get the size of a large file (around 6GB) on Windows
When compiling C++ code with clang++ on Windows, the tellg() function returns -1 when attempting to get the size of a large file (around 6GB). However, when compiling the same code with MSVC or g++, the file size is retrieved correctly.
#include <fstream>
#include <iostream>
int main() {
std::ifstream inFIle("test.file", std::ios::binary|std::ios::in);
if(!inFIle.is_open() || !inFIle.good()) {
std::cout << "fail" << std::endl;
return 0;
}
inFIle.seekg(0, std::ios::end);
auto size = inFIle.tellg();
inFIle.seekg(0, std::ios::beg);
std::cout << size << std::endl;
return 0;
}
There's a combination of two issues here.
With the mingw headers, off_t defaults to a 32 bit size, unless building with -D_FILE_OFFSET_BITS=64.
Secondly, the code that uses off_t and ftello and fseeko in libcxx is in the headers, so it doesn't help to rebuild libcxx with this option; the user that includes libcxx headers will need to set _FILE_OFFSET_BITS=64 before including those haeders.
When setting up the base ABI for UCRT, it's a shame that we didn't think of defaulting off_t to 64 bit there, as we already did deviate a bit from the existing status quo, by making time_t default to 64 bit. Or what would you think of changing the default for off_t for UCRT configurations to 64 bit? It would be an ABI break, but I wonder how much of an issue it would be. CC @lhmouse @cjacek @mati865 @lazka @jeremyd2019
FWIW, if comparing with MSVC, they don't really have anything like off_t, so with MSVC headers, users are expected to manually call e.g. _ftelli64 instead of ftello. Building libcxx in MSVC mode also runs into this same bug, and there, setting _FILE_OFFSET_BITS=64 doesn't help at all (as it's not trying to use anything like ftello). The MS STL probably directly calls _ftelli64 instead.
I remembered there's another thing that we could do on the libcxx side, though - we can configure libcxx with -DLIBCXX_EXTRA_SITE_DEFINES="_FILE_OFFSET_BITS=64", which adds that define into libcxx's __config_site which all libcxx headers include. So that way, libcxx mostly would work as expected, with the small unexpected side effect that including standard C++ headers affects the user's a little. Although if the user already has included e.g. stdio.h before libcxx headers, it won't help either?
(GCC's libstdc++ does something similar for __USE_MINGW_ANSI_STDIO=1, with similar caveats. For msvcrt environments, that's mostly always set by default these days anyways, but for UCRT we still tend to default to having it disabled.)
When setting up the base ABI for UCRT, it's a shame that we didn't think of defaulting
off_tto 64 bit there, as we already did deviate a bit from the existing status quo, by makingtime_tdefault to 64 bit. Or what would you think of changing the default foroff_tfor UCRT configurations to 64 bit? It would be an ABI break, but I wonder how much of an issue it would be. CC @lhmouse @cjacek @mati865 @lazka @jeremyd2019
time_t is 64bit in MSVC now but off_t is still 32bit, so it doesn't seem too wrong what we have now when trying to reduce the difference there. I'd prefer MSVC compat if possible. Any cross-platform code which tries to support MSVC and >2GB files will have to avoid off_t anyway.
I remembered there's another thing that we could do on the libcxx side, though - we can configure libcxx with -DLIBCXX_EXTRA_SITE_DEFINES="_FILE_OFFSET_BITS=64", which adds that define into libcxx's __config_site which all libcxx headers include.
In MSYS2 I remember one case where it broke things since this leads to mingw-w64 defining "stat" https://github.com/mingw-w64/mingw-w64/blob/fab7cfe34e5388c920230d0237e3663804d91458/mingw-w64-headers/crt/sys/stat.h#L276 which clashed with a C++ class called stat in libtorrent (https://github.com/arvidn/libtorrent/commit/44da281de200ca97757e7ea92ec3f0a8642423af). Just an warning that there might be fallout.
Or what would you think of changing the default for
off_tfor UCRT configurations to 64 bit? It would be an ABI break, but I wonder how much of an issue it would be.
This seems justified to me.
time_tis 64bit in MSVC now butoff_tis still 32bit, so it doesn't seem too wrong what we have now when trying to reduce the difference there. I'd prefer MSVC compat if possible. Any cross-platform code which tries to support MSVC and >2GB files will have to avoid off_t anyway.
Microsoft defines off_t as a 'long integer', equivalently, a 32-bit integer: https://learn.microsoft.com/en-us/cpp/c-runtime-library/standard-types?view=msvc-170
It has been long since Platform SDK for Windows 2003. I am bit curious why there is such a POSIX type; it is an 'old name' and is deprecated anyway; why don't users prefer int64_t instead?
That being said, whether libc++ defines _FILE_OFFSET_BITS=64 is questionable; but despite that, it really should call ftello64() and feeko64() explicitly.
time_tis 64bit in MSVC now butoff_tis still 32bit, so it doesn't seem too wrong what we have now when trying to reduce the difference there. I'd prefer MSVC compat if possible. Any cross-platform code which tries to support MSVC and >2GB files will have to avoid off_t anyway.
That's true, but on the other hand, MSVC headers don't really use off_t so for them it essentially doesn't matter. For them it's only used for the st_size field in the 32 bit size versions of the stat structs (in the form _off_t though, but they're linked).
Switching the type would be beneficial for the unix originated projects which may be using ftello/fseeko (if available, otherwise just ftell/fseek) and not wanting to manually do something else.
I remembered there's another thing that we could do on the libcxx side, though - we can configure libcxx with -DLIBCXX_EXTRA_SITE_DEFINES="_FILE_OFFSET_BITS=64", which adds that define into libcxx's __config_site which all libcxx headers include.
In MSYS2 I remember one case where it broke things since this leads to mingw-w64 defining "stat" https://github.com/mingw-w64/mingw-w64/blob/fab7cfe34e5388c920230d0237e3663804d91458/mingw-w64-headers/crt/sys/stat.h#L276 which clashed with a C++ class called stat in libtorrent (arvidn/libtorrent@44da281). Just an warning that there might be fallout.
Right, indeed - yeah there may be fallout.
(That's a good reference though - I was wondering if defining _FILE_OFFSET_BITS=64 wouldn't break the use of stat, but that redirect seems to make it work as intended anyway.)
Or what would you think of changing the default for
off_tfor UCRT configurations to 64 bit? It would be an ABI break, but I wonder how much of an issue it would be.This seems justified to me.
Also regarding uses of both stat and ftello - as each individual object file references different stat functions and ftell functions depending on the header configuration, an ABI break like this wouldn't mostly be visible at all. It would only be visible if you have data types that include off_t (or stat structs) on an ABI boundary (e.g. combining an old object file built with old headers, with a new one built with new headers, touching a struct containing off_t, or using an API across DLL boundaries, if that API uses structs with off_t).
time_tis 64bit in MSVC now butoff_tis still 32bit, so it doesn't seem too wrong what we have now when trying to reduce the difference there. I'd prefer MSVC compat if possible. Any cross-platform code which tries to support MSVC and >2GB files will have to avoid off_t anyway.Microsoft defines
off_tas a 'long integer', equivalently, a 32-bit integer: https://learn.microsoft.com/en-us/cpp/c-runtime-library/standard-types?view=msvc-170It has been
longsince Platform SDK for Windows 2003. I am bit curious why there is such a POSIX type; it is an 'old name' and is deprecated anyway; why don't users preferint64_tinstead?
It's probably meant for transitionary uses, for cases where you earlier might have been ok with having 32 bit file sizes/offsets, but wanted the same API to upgrade to 64 bit file sizes later. In modern days, there's indeed not much reason to bother with it...
That being said, whether libc++ defines
_FILE_OFFSET_BITS=64is questionable; but despite that, it really should callftello64()andfeeko64()explicitly.
That's indeed right; if libcxx wants to have working >2 GB file sizes in MSVC build configurations, it would need to switch to such an API anyway. And if it does that, it doesn't matter how we define off_t or ftello by default.
That being said, whether libc++ defines
_FILE_OFFSET_BITS=64is questionable; but despite that, it really should callftello64()andfeeko64()explicitly.That's indeed right; if libcxx wants to have working >2 GB file sizes in MSVC build configurations, it would need to switch to such an API anyway. And if it does that, it doesn't matter how we define
off_torftelloby default.
This fix went in recently in https://github.com/llvm/llvm-project/commit/86e20b00c313e96db3b69d440bfb2ca9063f08f0 / https://github.com/llvm/llvm-project/pull/123128 and was backported to the upcoming 20.x release series in https://github.com/llvm/llvm-project/commit/d777df5cbd35b301826b2b1500b5eb02d56818d5. The fix should be available in the latest prerelase of llvm-mingw at https://github.com/mstorsjo/llvm-mingw/releases/tag/20250203.