ADIOS2
ADIOS2 copied to clipboard
`2.10.1` fails on 32bit Windows in BP5 Serializer
Describe the bug
The current master does not build due to a uint64_t / size_t mismatch on ILP32 platforms:
2024-03-25T06:36:25.2420671Z D:\a\openPMD-api\openPMD-api\src\dep-adios2\ADIOS2-fix-static-blosc2-build\source\adios2\toolkit\format\bp5\BP5Serializer.cpp(1600,68): error C2397: conversion from 'uint64_t' to 'size_t' requires a narrowing conversion [D:\a\openPMD-api\openPMD-api\src\build-adios2\source\adios2\adios2_core.vcxproj]
2024-03-25T06:36:25.2425984Z D:\a\openPMD-api\openPMD-api\src\dep-adios2\ADIOS2-fix-static-blosc2-build\source\adios2\toolkit\format\bp5\BP5Serializer.cpp(1600,68): warning C4244: 'initializing': conversion from 'uint64_t' to 'size_t', possible loss of data [D:\a\openPMD-api\openPMD-api\src\build-adios2\source\adios2\adios2_core.vcxproj]
2024-03-25T06:36:25.2429288Z D:\a\openPMD-api\openPMD-api\src\dep-adios2\ADIOS2-fix-static-blosc2-build\source\adios2\toolkit\format\bp5\BP5Serializer.cpp(1601,32): warning C4244: '+=': conversion from 'uint64_t' to 'size_t', possible loss of data [D:\a\openPMD-api\openPMD-api\src\build-adios2\source\adios2\adios2_core.vcxproj]
2024-03-25T06:36:25.2431866Z D:\a\openPMD-api\openPMD-api\src\dep-adios2\ADIOS2-fix-static-blosc2-build\source\adios2\toolkit\format\bp5\BP5Serializer.cpp(1608,69): error C2397: conversion from 'uint64_t' to 'size_t' requires a narrowing conversion [D:\a\openPMD-api\openPMD-api\src\build-adios2\source\adios2\adios2_core.vcxproj]
To Reproduce Build on Visual Studio / w/ MSVC in 32bit mode ("x86").
Expected behavior Build on LP32, ILP32, LLP64 and LP64 platforms.
Desktop (please complete the following information):
- OS/Platform: Windows Visual Studi 16 2019, 32bit (cross-compiled from 64bit)
- Build: MSVC 19.29.30154.0 (x86) aka 32bit
Additional context Building new wheels and this worked in the past: https://github.com/openPMD/openPMD-api/pull/1554
You might wonder: what is wrong with you, why 32bit Windows? You are right, this is to get an in with experimental physicists. In labs, some hardware has very weird control software and limited/old driver support, which only runs on 32bit Windows.
I am ok to ditch this eventually, but since it is easy to fix and keep general, we should do it.
Should size_t Position = 0; simply be a uint64_t Position = 0; @eisenhauer ?
Ugh. Making this stuff build and run cleanly on 32- and 64-bit was something I advocated for years ago, but couldn't get off the ground. (Doing it properly really requires saving output files from different platforms, making sure what's written on one is readable on another, etc. FFS does this, but we've not done it for ADIOS.) So, to the extent that we can tweak something and it gets us where we need to be, I'm completely supportive. I think the question is how far we go down the road from "it compiled once" through "it's in CI so we don't break it accidentally" to "we support and test 32/64 bit interoperability to the extent possible".
WRT the specific question, yeah, Position is probably uint64_t, but given the pickiness of VS, making that change may require a bunch more casts somewhere. If you've go this setup and can sort out the compile issues, I'm happy to look at the PR, but the number of background code projects I've got on my plate is a little daunting at the moment, so tackling it myself might be slow.
Generally, yes I would expect a file format (H5, BP4, BP5) to handle for me the concerns of how to encode a file so I can read it back between any system (little/big endian, 32bit to 64bit, platform specific sizes of ints and floats). Otherwise it is not a portable format.
That said, at the current point (this issue), I am happy if things compile and in/out consistently on the same platform...
This is what we deploy on downstream:
openPMD-api deployment targets
- Linux 32bit (i686) - GH actions, cross-compiled
- Note: mostly to avoid hard-coding platform assumptions
- Linux 64bit (x86_64) - GH actions
- Linux aarch64 - CircleCI
- Linux ppc64le (little endian) - TravisCI using its free partner queue
- Note: Lassen (LLNL) and Summit (OLCF) are the last machines we need it on.
- Windows 32bit (x86) - GH actions, cross-compiled from 64bit
- Note: mostly for easy install on experimentalist hardware and to avoid hard-coding platform assumptions
- Windows 64bit (x86_64) - GH actions from 64bit
- MacOS 64bit (x86_64) - GH actions from 64bit
- Note: Apple is fading this out mid-term
- MacOS 64bit (aarch64/arm64) - GH actions, cross-compiled from x86_64
- GH actions has now official aarch64 hardware, too
openPMD-api deployment variants
- all-shared: conda, Spack, etc.
- all-static (besides Python module, which is a shared lib): Pip
Even coming up with a big-endian platform to do CI on is a challenge. Heterogeneity isn't what it used to be...
(that said, I am personally not needing big endian support, luckily HPC has no such machine atm :D)
Last BE machine I knew of was a Sparc-based machine at JAXA... Before it was activated I assumed that as a PowerPC machine Summit would be BE, but they put it in LE mode. Lots of BP5 code is untested for BE.
@eisenhauer @vicentebolea ADIOS 2.10.1 compiles on 32bit Windows still fail with:
...
2024-10-08T05:59:38.9807163Z D:\a\openPMD-api\openPMD-api\src\dep-adios2\ADIOS2-2.10.1\thirdparty\EVPath\EVPath\qual_hostname.c(343,13): error C2373: 'WSAGetLastError': redefinition; different type modifiers [D:\a\openPMD-api\openPMD-api\src\build-adios2\thirdparty\EVPath\EVPath\EVPath.vcxproj]
2024-10-08T05:59:38.9811929Z C:\Program Files (x86)\Windows Kits\10\Include\10.0.17763.0\um\winsock2.h(2405): message : see declaration of 'WSAGetLastError' [D:\a\openPMD-api\openPMD-api\src\build-adios2\thirdparty\EVPath\EVPath\EVPath.vcxproj]
2024-10-08T05:59:38.9860168Z cmmulticast.c
2024-10-08T05:59:39.1815577Z D:\a\openPMD-api\openPMD-api\src\dep-adios2\ADIOS2-2.10.1\source\adios2/common/ADIOSTypes.h(292,32): warning C4309: 'initializing': truncation of constant value [D:\a\openPMD-api\openPMD-api\src\build-adios2\source\adios2\adios2_core.vcxproj]
2024-10-08T05:59:39.2109372Z D:\a\openPMD-api\openPMD-api\src\dep-adios2\ADIOS2-2.10.1\thirdparty\EVPath\EVPath\qual_hostname.c(343,13): error C2373: 'WSAGetLastError': redefinition; different type modifiers [D:\a\openPMD-api\openPMD-api\src\build-adios2\thirdparty\EVPath\EVPath\EVPath.vcxproj]
2024-10-08T05:59:39.2113228Z C:\Program Files (x86)\Windows Kits\10\Include\10.0.17763.0\um\winsock2.h(2405): message : see declaration of 'WSAGetLastError' [D:\a\openPMD-api\openPMD-api\src\build-adios2\thirdparty\EVPath\EVPath\EVPath.vcxproj]
...
2024-10-08T06:00:56.2432817Z D:\a\openPMD-api\openPMD-api\src\dep-adios2\ADIOS2-2.10.1\source\adios2\toolkit\format\bp5\BP5Serializer.cpp(1600,68): error C2397: conversion from 'uint64_t' to 'size_t' requires a narrowing conversion [D:\a\openPMD-api\openPMD-api\src\build-adios2\source\adios2\adios2_core.vcxproj]
2024-10-08T06:00:56.2436699Z D:\a\openPMD-api\openPMD-api\src\dep-adios2\ADIOS2-2.10.1\source\adios2\toolkit\format\bp5\BP5Serializer.cpp(1600,68): warning C4244: 'initializing': conversion from 'uint64_t' to 'size_t', possible loss of data [D:\a\openPMD-api\openPMD-api\src\build-adios2\source\adios2\adios2_core.vcxproj]
2024-10-08T06:00:56.2440711Z D:\a\openPMD-api\openPMD-api\src\dep-adios2\ADIOS2-2.10.1\source\adios2\toolkit\format\bp5\BP5Serializer.cpp(1601,32): warning C4244: '+=': conversion from 'uint64_t' to 'size_t', possible loss of data [D:\a\openPMD-api\openPMD-api\src\build-adios2\source\adios2\adios2_core.vcxproj]
2024-10-08T06:00:56.2444462Z D:\a\openPMD-api\openPMD-api\src\dep-adios2\ADIOS2-2.10.1\source\adios2\toolkit\format\bp5\BP5Serializer.cpp(1608,69): error C2397: conversion from 'uint64_t' to 'size_t' requires a narrowing conversion [D:\a\openPMD-api\openPMD-api\src\build-adios2\source\adios2\adios2_core.vcxproj]
...
Hey Axel. Unfortunately, I think making ADIOS2 run on a 32-bit machine would be a truly massive effort, far beyond just getting it to compile without error or warnings. To get an idea why, here's a few lines from ADIOSTypes.h:
constexpr size_t UnknownDim = 0;
constexpr uint64_t JoinedDim = MaxU64 - 1;
constexpr uint64_t LocalValueDim = MaxU64 - 2;
constexpr uint64_t IrregularDim = MaxU64 - 3;
constexpr size_t UnknownStep = MaxU64;
using Dims = std::vector<size_t>;
using Steps = size_t;
If you look at this closely, you'll see an inconsistency in usage WRT whether or not the basic type to hold either Step count or a Dimension size is size_t or uint64_t. ADIOSTypes.h is of course one of the most basic files in ADIOS, so if this inconsistency exists there it is certain to ubiquitous in the code. All of those inconsistencies would have to be resolved, preferably in the "right" direction, where the definition of "right" isn't entirely clear. If you look at the compile log, you'll see stacks of warnings in the HDF5 about casual conversions between "hsize_t" and "size_t", which likely means that it has the same problem. The worst part would be that sorting out all the compiler warnings wouldn't be enough to finish the job because you still might have things that you won't detect until run-time, inadvertent truncations of a dimension value so that you'll never get a true equality check with one of the constant Dim values above, etc. Even if you got all the tests to pass you'd still want to make sure that the files you produced were still readable on a 64-bit machine and vice-versa. If this were a few days or even a few weeks worth of work we might find someone to do it, but I have to think that it would be a lot more effort than that. I think this is probably a lost cause.
Hi @eisenhauer,
I understand this challenge, I think one can address it step-wise.
From my perspective, as a first and tiny step to make sure 32bit-written files can be read on 32bit platforms would also be useful. I am not sure if ADIOS2 has automation from little to big endian on 64bit machines and other true portability features at this point (and tested) either.
Note that ADIOS2 compiles and provides this level of support already on 32bit Linux. ADIOS2 also provided this level of support in earlier releases (e.g., v2.8).
you'll see stacks of warnings in the HDF5 about casual conversions between "hsize_t" and "size_t", which likely means that it has the same problem.
Interestingly it does not indicate that problem in HDF5. In fact, HDF5 files are portable from 32bit to 64bit and the HDF5 library on reads also knows how to handle automatic big/little endian conversions.
Long term.
I think long-term, ADIOS2 could indeed explore the additional steps you mention. With the increase of architectural diversity (Cambrian explosion of architectures) that is ongoing, not providing a portable file format is indeed a problem for ADIOS2 at this point. I am not saying that 32bit Windows is my most urgent architecture, but I anticipate that new architectures might simply do more choices and supporting the current four fundamental C++ data models (LP32, ILP32, LLP64, LP64) would be a sensible start, I think.
... where the definition of "right" isn't entirely clear.
For everything that are steps and sizes, just hard-coding 64bit integers consistently is a sensible choice for portability. This is mostly metadata anyway. One can still add runtime exceptions on 32bit architectures if the size used at runtime exceeds true storage possibilities once encountered (e.g., maybe a 5GB buffer cannot be read or written on 32bit architectures - that is totally ok. We do not need to work around limitations of 32bit operating systems. But all data sets that use below 32bit extents and/or buffer sizes can still be read and written and moved between machines then.).
We want to use ADIOS2 way more on edge machines for data acquisition in experiments, and ensuring we can write on smaller machines (e.g., 32bit Windows & Linux) and read those files on an HPC machine for ML tasks is pretty useful. (The other way also makes sense, e.g., if we do ML training on HPC and export a small trained model via ADIOS2 data to an edge machine.)
So, let me say that I'm firmly in the camp of supporting architectural diversity. The FFS portion of our software stack was specifically designed to compile and work everywhere, and to support efficient translations between machine representations between 32/64 bit, big endian, little endian, middle endian, etc. But having recently banged my head against code that fuzzily assumed that sizeof(long)=sizeof(void*) (True everywhere except in visual studio) I know the pain of the process of fixing such things. Not saying it wouldn't be good to do, just don't know who's going to do it.
Related warning in ADIOS 2.10.2:
source/adios2/toolkit/format/bp5/BP5Serializer.cpp: In member function ‘adios2::format::BP5Serializer::_BP5WriterRec* adios2::format::BP5Serializer::CreateWriterRec(void*, const char*, adios2::DataType, std::size_t, std::size_t)’:
/home/axel/src/adios2/source/adios2/toolkit/format/bp5/BP5Serializer.cpp:331:40: warning: ‘%s’ directive output between 0 and 2147483646 bytes may cause result to exceed ‘INT_MAX’ [-Wformat-truncation=]
331 | snprintf(Ret, Len, "%s%s%zd_%d_%s", Prefix, ExpressionInsert, element_size, type, StructID);
| ^~
Oh, there is a EVPath issue:
thirdparty\EVPath\EVPath\ip_config.c(368,13): error C2373: 'WSAGetLastError': redefinition; different type modifiers [D:\a\openPMD-api\openPMD-api\src\build-adios2\thirdparty\EVPath\EVPath\EVPath.vcxproj]
Oh, there is a
EVPathissue:thirdparty\EVPath\EVPath\ip_config.c(368,13): error C2373: 'WSAGetLastError': redefinition; different type modifiers [D:\a\openPMD-api\openPMD-api\src\build-adios2\thirdparty\EVPath\EVPath\EVPath.vcxproj]
I went looking for the source of this problem at one point, but came up empty. If you have a fix, I'm happy to apply it...