extend `sys` module on windows
It would be useful to be able to detect:
- the platform as 32-bit vs 64-bit vs ARM64.
- the windows api partition
- multiple partitions can be true at the same time, so this should probably be a named tuple
@zooba you mentioned adding them to sys.implementation. Since they are only available on windows, I assume this would mean adding them with an underscore prefix? The alternative would be a separate API similar to sys.winver.
This is a follow up to #102256.
Linked PRs
- gh-124582
Yeah, I don't think we want to bother with making them required attributes, so I'd add:
sys.implementation._architectureand make it the value of$(ArchName)from python.props (this approx. matches platform.architecture)sys.implementation._api_partitionsand make it a tuple of all the ones we test for. No need to add names, simpleinchecks will be the most common use
Is sys.implementation the right place? It should have things related only to the Python implementation, that other implementations also have. The architecture is a build characteristic and the api partitions seem like it as well. Where do we normally record build or platform (or OS) information?
The only real alternative is at the top level of the sys module. Maybe you could make a case for it going into sysconfig, but we still need an attribute on a native module somewhere, as it's the only way we're getting the field it at build time.
I assume this hasn't come up before because Windows is the only OS with transparent CPU emulation? So querying the current OS architecture isn't sufficient to determine the current process architecture.
sys._build.*?
Really it should be sys.platform_but_for_real_this_time as we can't just go updating sys.platform any more.
What if we added sys.architecture for all platforms? platform.architecture(sys.executable) is too limited in its return values to work here, but it ought to be known at compile time, right? And something like Jython or IronPython can use clr or jvm if they prefer. Maybe they've already got an additional field for that?
the platform as 32-bit vs 64-bit vs ARM64.
There are multiple existing APIs to do that. Examples:
>>> platform.architecture()
('64bit', 'ELF')
>>> os.uname().machine
'x86_64'
I don't think that we should add information retrieved at runtime in sys. Usually, sys is more information computed during the build.
platform.architecture() doesn't distinguish between ARM64 and x64, and os.uname().machine doesn't reflect how CPython was compiled. So we need to store the target architecture at build time.
But you're correct that we don't need to retrieve it at runtime. (Though worth noting that a lot of sys is dynamic - path, modules, various hooks, etc.)
Hmmm. Actually, the discussion in the PR failed because no one decided whether to merge it or not (it seems that the only supporter is @zooba). The newly added arch is used to get the current build architecture in Windows (it comes from python.props). What do you think?
link: https://github.com/python/cpython/pull/124582 https://discuss.python.org/t/regarding-whether-we-should-add-py-currentarch-or-py-archname-function/65191/17
My main concern was/is that this information doesn't belong on sys.implementation. That's meant for information about the Python implementation (e.g. cpython, pypy, jython, micropython), not about the build. See PEP 421.
Something like sys._architecture, sys.build.architecture, or platform.??? or on sysconfig (like @zooba suggested) would be more appropriate. (FWIW, I like the idea of sys.build.*, PEP-421-style, but its hypothetical contents might already be covered elsewhere.)
All that said, I haven't looked closely at what we're trying to actually accomplish here, so I don't have much feedback on that.
I'm often the only supporter of things involving Windows :-) Usually aggregated from a lot of discussions with a range of users, but not easily cited (many Windows users are scared off by the culture here, so they won't report stuff themselves).
In this case, we've been looking at adding support to project's build scripts to allow building on Windows ARM64, which requires the builder to know whether it should be targeted ARM64 or not. That's usually done by running in the ARM64 interpreter, and then to cross-compile you'd run in the x64 or x86 interpreter, but we don't currently have a reliable way to detect that. Historically we haven't needed it, because sys.maxsize (or sys.maxint) was sufficient to detect 64-bit vs 32-bit. But that can't tell the difference between ARM64 and AMD64.
More importantly, and relevant for this case, is that platform.architecture() can't return the difference, even if it could detect it, because it's defined as only returning "64" or "32". And PEP 421 explicitly suggests sys.implementation is the place to store information that would be returned from platform about the Python implementation. The architecture the current runtime was compiled for sure seems to be related to the implementation, and since platform would be the obvious place to return such information (as it already is, albeit insufficiently), adding the field to sys.implementation seems fine.
Having now re-read that PEP though,[^1] I think we should drop the underscore. Make it sys.implementation.architecture and if it exists then platform.architecture(sys.executable) can return it rather than querying the actual binary (if it's within the range of allowable return values, and perhaps we can later add platform.architecture_but_for_real_this_time())
[^1]: Specifically Non-required Attributes and Use Cases, the platform and Jython use cases being most relevant.
platform.machine() is another possible place for such information. It already has Windows specific code to detect the architecture.
platform.processor() also seems like a good candidate. There's also a handful of platform.win32_*() APIs. Perhaps a similar one could be added. I agree with Eric and others that sys.implementation seems unfit for this API.
platform.machine() and platform.processor() are for different purposes. We want to know the compile-time target of the runtime, which is unrelated to the current machine type (Windows has emulation on most architectures, so the machine could be ARM64 but the runtime is x86 and there's currently no way to determine this).
I agree. If I remember correctly, the win kernel is actually simulating multiple subsystems at the same time (such as win32. It's used to be compatible with historical calls). Maybe when building, it's in win32, but in fact it's in amd64. Maybe it‘s in arm64. These can only be obtained in the initial state at build time.
OK, I think I understand Steve's point of view now. There's precedent for this in the stdlib: we're using sys.implementation in platform here: https://github.com/python/cpython/blob/10c4c95395771fb37e93811aaace42f026c16de5/Lib/platform.py#L889
I'm only comfortable with sys.implementation._architecture if it's an internal thing used only for platform, instead of a public interface--then, it really is an implementation detail, and sort of fits there.
Alternatively, we could add a public sys.architecture that also supports other systems, rather than just Windows. It doesn't seem like it would be all that difficult to just define ARCH_NAME as something else for Linux builds.
I think that for sys.architecture, a new PR can be submitted after the _architecture attribute is added. This issue may be put on hold for now. In fact, if possible, the value obtained by sys.architecture should be the same as the _architecture attribute.
If we go for sys.architecture, then there's no need for an _architecture attribute at all.
As far as the discussion is concerned, it seems that only the _architecture attribute is what we need, because other platforms are not actually as complicated as windows.
My main hesitation with defining it as public sys.architecture is that now we have to figure out sensible values for all the other platforms as well... what do we put for WASI? How do we under-specify it enough that we can expand to new platforms later on and not run into platform.architecture type issues (or sys.platform type issues)?
Platforms that use our configure build system (that is, most platforms) can retrieve compile time stuff via sysconfig APIs.
The sysconfig module sounds like a good place to add the build-time architecture on Windows.
The sysconfig module doesn't have a native component, and we don't have a compile-time generated file that ships with the runtime (yet). sysconfig retrieves compile-time stuff on Linux because we include a copy of Makefile, and it scans it.
Storing a single value somewhere in a native module is much easier on Windows than adding an entirely new static file that is generated at compile time. The question is about where we should store the value, more than where it gets retrieved from.
The
sysconfigmodule doesn't have a native component, and we don't have a compile-time generated file that ships with the runtime (yet).sysconfigretrieves compile-time stuff on Linux because we include a copy ofMakefile, and it scans it.
Actually, I was just surprised to learn that it looks like a _sysconfig module has snuck in via gh-88402/GH-110049[^1].
[^1]: And to be clear, I consider that a good thing :). More of sysconfig/_sysconfigdata should probably make its way there eventually.
IMO, this should be implemented by adding MULTIARCH — which represents a target triple — in sys.get_config_vars() on windows, while we don't have a better API. I don't think sys.implementation is the right place, this is a build detail, not an implementation-specific detail.
I was just surprised to learn that it looks like a
_sysconfigmodule has snuck in
Ah, so it has.
this should be implemented by adding
MULTIARCH— which represents a target triple
Given that we can embed it at compile time, I'm fine with this. Do we have official target triples for Windows? Or do we just borrow them from someone else (I'm pretty sure Rust uses them, yeah?)
I actually have no problem with adding the arch member to MULTIARCH, but would that be better than adding it to the implementation? For example, in terms of declarative?
@zooba, what do you think about this?
would that be better than adding it to the
implementation? For example, in terms of declarative?
It would be added to the Modules\_sysconfig.c file (which I'd forgotten about), so it's easy enough to add. It's a bit rough on users, but at least it'll be possible to get at the information.
We need a spec for MULTIARCH though, to define exactly what the values should look like.
That is, it should look like this
>>> import _sysconfig
>>> _sysconfig.config_vars()
{'EXT_SUFFIX': '.cp314-win_amd64.pyd', 'SOABI': 'cp314-win_amd64', 'Py_GIL_DISABLED': 0, 'ARCH': 'win32'}
Edit (or maybe I didn't output the value correctly).
Well, if EXT_SUFFIX contains win_amd64 then I'd expect ARCH to be amd64, but the spec has to list all possible values and what they mean, across all platforms (or we make it unsupported on some, and it might already exist on others). And it should go in the documentation somewhere.
If you want to access build information, sysconfig is the right module. The platform module is for information about the platform Python is currently running on.
At the time, the platform module was conceived, there was no ARM64 architecture and only 32bit vs. 64bit was important.
It's certainly possible to add another argument family='' and have this return the architecture family, if it can be detected. The return value of platform.architecture() would then have to be turned into a tuple subclass to maintain backwards compatibility and still return the new field as an attribute (similar to how codecs.CodecInfo works). family could then return "ARM64", "ARM32", "x86_64", "x86", etc.