comtypes icon indicating copy to clipboard operation
comtypes copied to clipboard

Delayed loading of npsupport

Open bennyrowland opened this issue 1 year ago • 22 comments

I have been implementing some COM servers using comtypes, which are generally working very well. The design of the system is such that individual processing operations are provided by each COM server and are launched from a client GUI. In order to keep the GUI thread responsive, the servers each have a second "worker" thread to do their processing, the main thread only handles the COM interactions.

One of the issues that I have been confronting is start-up times for the server application, as this necessarily blocks the client software. This seems to be mainly caused by import times for the various libraries required, and so I have arranged things so that most libraries are imported by the worker thread rather than the main thread - however the most significant remaining library is numpy. I use this for processing, but it is not necessary to have the comtypes numpy interop behaviour. However, because numpy is installed, comtypes insists on importing it at startup, doubling the overall time to launch for a COM server. Do we think it would be possible to modify the way that npsupport is imported, either allowing it to be turned off completely, or even better to allow it to be loaded independently, i.e. import comtypes without it, starting up a server very fast, then import comtypes.npsupport (or call a function like comtypes.enable_np) to switch over to supporting it afterwards?

bennyrowland avatar Jul 29 '22 10:07 bennyrowland

@bennyrowland

I also thought a situation like your issue was problematic.

I didn't really have a problem with it because I don't use the server feature in my project.

Recently I have been working with restore test code(see #267, #298 and other issues/PRs). In the process, I noticed the following problem.

  • numpy is referenced even though it is not a dependency.
    • comtypes will be developed as "a pure Python library (without any dependencies)" in the future as well(https://github.com/enthought/comtypes/issues/216#issuecomment-664517594 by @vasily-v-ryabov).
  • Behavior of safearray and others changes depending on whether numpy exists in the environment or not.

As shown in numpy-interop in the documentation, using safearray_as_ndarray context manager to cast the return value to ndarray. Similarly, I thought that force to use safearray_as_ndarray when using ndarray as an argument, but this is not a good idea for backward compatibility since the behavior would be different from the previous one.

Currently, regardless of whether ndarray is used or not, npsupport is called to check whether numpy exists in the environment first.

Therefore, I think a better approach is that npsupport should only be imported when trying to use ndarray as an argument.

What are your thoughts?

junkmd avatar Jul 31 '22 02:07 junkmd

@junkmd tricky to import npsupport when using ndarray as an argument because how do you test if something is an ndarray unless you have numpy.ndarray imported already (which is exactly what npsupport already does).

At the moment, the public functions of npsupport all contain a check for HAVE_NUMPY that can short-circuit the function if numpy is not available. HAVE_NUMPY gets set at import time if import numpy succeeds, and then, also at import time, if HAVE_NUMPY is set then some other housekeeping is done to set up the interop. My proposal is simply that npsupport should not import numpy at its own import time, but provide a function "enable_numpy_interop()" (or similar) which will:

  1. Try import numpy
  2. Set HAVE_NUMPY
  3. Run the other housekeeping tasks

This would make importing numpy opt-in, if you want comtypes to use numpy then you would have to call the "enable_numpy_interop()" function. That function could also be inserted into things like the safearay_as_ndarray context manager (if "enable_numpy_interop()" checks for HAVE_NUMPY at the top then it will be a minimal burden to run it repeatedly) as such a use clearly requires numpy support to be enabled.

This is not ideal for backwards compatibility as scripts that were previously using numpy automatically would now have to enable support explicitly. However, I think that if we were to modify HAVE_NUMPY to instead be a function which would check for numpy in sys.modules and if found then check to see if the setup tasks for interop have been run, running them if necessary, then we should be get to a situation where if the user has imported numpy themselves then everything will work as before - obviously they will have had to do that to create an ndarray to pass as an argument, so it should become transparent.

Thoughts? It would also be nice to get some perspective from @vasily-v-ryabov or @cfarrow, so we know if this change will get accepted if I implement it...

bennyrowland avatar Aug 04 '22 12:08 bennyrowland

Perhaps a correction, safearrays should not behave differently if you have numpy installed. This should only happen if you use the safearray_as_ndarray context manager, which is explicit.

I'm in favor of replacing the import-time checks in npsupport with a enable_numpy_interop function. This will make the use of numpy explicit, and the package will remain "pure python" by default.

cfarrow avatar Aug 04 '22 13:08 cfarrow

Thanks, @cfarrow, but just to be clear, does that mean you would support a breaking change where numpy support is only included with an explicit enable_numpy_interop() call - what would then happen if an ndarray gets passed without making that call (as will surely be the case with many existing scripts)?

bennyrowland avatar Aug 04 '22 13:08 bennyrowland

I think the cleanest way to handle it is to allow that code to fail with a helpful error message if enable_numpy_interop has not been called. This way, the behavior of the package will be solely determined by the code instead of the python environment.

This will break backwards compatibility, and therefore would warrant a major version bump.

cfarrow avatar Aug 04 '22 13:08 cfarrow

Ok, I will have a look into making this work. It might still be possible to get comtypes to accept an ndarray as an input argument and simply send it as a SAFEARRAY by converting element by element (I think this happens for unsupported dtypes anyway) - this would obviously affect performance but not break scripts (I think), we could then simply emit a warning about the changed behaviour rather than an exception?

bennyrowland avatar Aug 04 '22 13:08 bennyrowland

@bennyrowland @cfarrow

Thank you guys for replies and opinions.

I completely agree that it is better to do things explicitly.

However, backward compatibility is the only problem I had, and I didn't want developers to experience the sadness of not being able to use the API as they used to.

If calling a function at anywhere of the module makes they behave the same as before, that's fine.

The project that using comtypes may involves some/many correspondence. But I think that the above changes will not require much man-hours to deal with.

junkmd avatar Aug 04 '22 13:08 junkmd

If there is a "pure python" default that works, then a warning would be appropriate.

For the future, another option is to not assume a ndarray but instead check for a buffer. This would allow comtypes to handle a variety of array types efficiently. (I'm not quite sure how to do that, but it would be fun to figure out.)

cfarrow avatar Aug 04 '22 13:08 cfarrow

@bennyrowland @cfarrow

I agree with the major version update.

Look at the history, it has only been revision updates for the last 8 years!

junkmd avatar Aug 04 '22 14:08 junkmd

I would be willing to help repair the tests that use numpy if this change would allow us to explicitly decide whether to use numpy or not.

junkmd avatar Aug 04 '22 14:08 junkmd

@vasily-v-ryabov How does this proposed change affect your release plan?

@bennyrowland Do you have any opinions when a version with this changes is good to release?

I propose to release 1.1.13, which fixes bugs in 1.1.12, before this change. After that, release a major update that applies this change.

This purpose is to prevent developers from using 1.1.12 when they are trying to use a version previous to the major version update related with numpy.

junkmd avatar Aug 04 '22 14:08 junkmd

Hi all, I started working on this project this morning and began with running the tests on the latest master branch using python -m unittest discover -v -s ./comtypes/test -t comtypes\test (cribbed from appveyor.yml, is there a better command?) and there are already 4 tests that fail, all of which seem related to numpy interop:

unittest output
======================================================================
ERROR: test_UDT_ndarray (test_safearray.SafeArrayTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "H:\Python\comtypes\comtypes\safearray.py", line 60, in _midlSAFEARRAY
    return POINTER(_safearray_type_cache[itemtype])
KeyError: <class 'comtypes.gen._5A3E1D1D_947A_44AC_9B03_5C37D5F5FFFC_0_1_0.MYCOLOR'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "H:\Python\comtypes\comtypes\safearray.py", line 78, in _make_safearray_type
    vartype = _ctype_to_vartype[itemtype]
KeyError: <class 'comtypes.gen._5A3E1D1D_947A_44AC_9B03_5C37D5F5FFFC_0_1_0.MYCOLOR'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "H:\Python\comtypes\comtypes\test\test_safearray.py", line 427, in test_UDT_ndarray
    t = _midlSAFEARRAY(MYCOLOR)
  File "H:\Python\comtypes\comtypes\safearray.py", line 62, in _midlSAFEARRAY
    sa_type = _make_safearray_type(itemtype)
  File "H:\Python\comtypes\comtypes\safearray.py", line 88, in _make_safearray_type
    extra = GetRecordInfoFromGuids(*guids)
  File "H:\Python\comtypes\comtypes\typeinfo.py", line 469, in GetRecordInfoFromGuids
    _oleaut32.GetRecordInfoFromGuids(byref(GUID(rGuidTypeLib)),
  File "_ctypes/callproc.c", line 997, in GetResult
OSError: [WinError -2147319779] Library not registered

======================================================================
ERROR: test_VT_UNKNOWN_multi_ndarray (test_safearray.SafeArrayTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "H:\test_venv\lib\site-packages\numpy\core\_internal.py", line 652, in _dtype_from_pep3118
    dtype, align = __dtype_from_pep3118(stream, is_subdtype=False)
  File "H:\test_venv\lib\site-packages\numpy\core\_internal.py", line 729, in __dtype_from_pep3118
    raise ValueError("Unknown PEP 3118 data type specifier %r" % stream.s)
ValueError: Unknown PEP 3118 data type specifier 'P'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "H:\Python\comtypes\comtypes\test\test_safearray.py", line 376, in test_VT_UNKNOWN_multi_ndarray
    arr = get_array(sa)
  File "H:\Python\comtypes\comtypes\test\test_safearray.py", line 19, in get_array
    return sa[0]
  File "H:\Python\comtypes\comtypes\safearray.py", line 216, in __getitem__
    return self.unpack()
  File "H:\Python\comtypes\comtypes\safearray.py", line 251, in unpack
    return numpy.asarray(result)
ValueError: '<P' is not a valid PEP 3118 buffer format string

======================================================================
ERROR: test_VT_VARIANT_ndarray (test_safearray.SafeArrayTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "H:\Python\comtypes\comtypes\test\test_safearray.py", line 248, in test_VT_VARIANT_ndarray
    sa = t.from_param(inarr)
  File "H:\Python\comtypes\comtypes\safearray.py", line 208, in from_param
    value = cls.create(value, extra)
  File "H:\Python\comtypes\comtypes\safearray.py", line 117, in create
    return cls.create_from_ndarray(value, extra)
  File "H:\Python\comtypes\comtypes\safearray.py", line 160, in create_from_ndarray
    value = _ndarray_to_variant_array(value)
  File "H:\Python\comtypes\comtypes\safearray.py", line 387, in _ndarray_to_variant_array
    varr.flat = [VARIANT(v) for v in value.flat]
  File "H:\test_venv\lib\site-packages\numpy\core\_dtype_ctypes.py", line 110, in dtype_from_ctypes_type
    return _from_ctypes_structure(t)
  File "H:\test_venv\lib\site-packages\numpy\core\_dtype_ctypes.py", line 65, in _from_ctypes_structure
    fields.append((fname, dtype_from_ctypes_type(ftyp)))
  File "H:\test_venv\lib\site-packages\numpy\core\_dtype_ctypes.py", line 112, in dtype_from_ctypes_type
    return _from_ctypes_union(t)
  File "H:\test_venv\lib\site-packages\numpy\core\_dtype_ctypes.py", line 90, in _from_ctypes_union
    formats.append(dtype_from_ctypes_type(ftyp))
  File "H:\test_venv\lib\site-packages\numpy\core\_dtype_ctypes.py", line 110, in dtype_from_ctypes_type
    return _from_ctypes_structure(t)
  File "H:\test_venv\lib\site-packages\numpy\core\_dtype_ctypes.py", line 65, in _from_ctypes_structure
    fields.append((fname, dtype_from_ctypes_type(ftyp)))
  File "H:\test_venv\lib\site-packages\numpy\core\_dtype_ctypes.py", line 112, in dtype_from_ctypes_type
    return _from_ctypes_union(t)
  File "H:\test_venv\lib\site-packages\numpy\core\_dtype_ctypes.py", line 90, in _from_ctypes_union
    formats.append(dtype_from_ctypes_type(ftyp))
  File "H:\test_venv\lib\site-packages\numpy\core\_dtype_ctypes.py", line 114, in dtype_from_ctypes_type
    return _from_ctypes_scalar(t)
  File "H:\test_venv\lib\site-packages\numpy\core\_dtype_ctypes.py", line 80, in _from_ctypes_scalar
    return np.dtype(t._type_)
TypeError: data type 'v' not understood

======================================================================
ERROR: test_mixed (test_variant.NdArrayTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "H:\Python\comtypes\comtypes\test\test_variant.py", line 275, in test_mixed
    v.value = a
  File "H:\Python\comtypes\comtypes\automation.py", line 317, in _set_value
    obj = _midlSAFEARRAY(VARIANT).create(value)
  File "H:\Python\comtypes\comtypes\safearray.py", line 117, in create
    return cls.create_from_ndarray(value, extra)
  File "H:\Python\comtypes\comtypes\safearray.py", line 160, in create_from_ndarray
    value = _ndarray_to_variant_array(value)
  File "H:\Python\comtypes\comtypes\safearray.py", line 387, in _ndarray_to_variant_array
    varr.flat = [VARIANT(v) for v in value.flat]
  File "H:\test_venv\lib\site-packages\numpy\core\_dtype_ctypes.py", line 110, in dtype_from_ctypes_type
    return _from_ctypes_structure(t)
  File "H:\test_venv\lib\site-packages\numpy\core\_dtype_ctypes.py", line 65, in _from_ctypes_structure
    fields.append((fname, dtype_from_ctypes_type(ftyp)))
  File "H:\test_venv\lib\site-packages\numpy\core\_dtype_ctypes.py", line 112, in dtype_from_ctypes_type
    return _from_ctypes_union(t)
  File "H:\test_venv\lib\site-packages\numpy\core\_dtype_ctypes.py", line 90, in _from_ctypes_union
    formats.append(dtype_from_ctypes_type(ftyp))
  File "H:\test_venv\lib\site-packages\numpy\core\_dtype_ctypes.py", line 110, in dtype_from_ctypes_type
    return _from_ctypes_structure(t)
  File "H:\test_venv\lib\site-packages\numpy\core\_dtype_ctypes.py", line 65, in _from_ctypes_structure
    fields.append((fname, dtype_from_ctypes_type(ftyp)))
  File "H:\test_venv\lib\site-packages\numpy\core\_dtype_ctypes.py", line 112, in dtype_from_ctypes_type
    return _from_ctypes_union(t)
  File "H:\test_venv\lib\site-packages\numpy\core\_dtype_ctypes.py", line 90, in _from_ctypes_union
    formats.append(dtype_from_ctypes_type(ftyp))
  File "H:\test_venv\lib\site-packages\numpy\core\_dtype_ctypes.py", line 114, in dtype_from_ctypes_type
    return _from_ctypes_scalar(t)
  File "H:\test_venv\lib\site-packages\numpy\core\_dtype_ctypes.py", line 80, in _from_ctypes_scalar
    return np.dtype(t._type_)
TypeError: data type 'v' not understood

----------------------------------------------------------------------

Does anybody know anything about these errors, is this an environment / numpy version issue on my end (I am running 1.23.1) or are these tests failing for others? Do these issues need fixing before making any other changes to numpy interop?

bennyrowland avatar Aug 05 '22 10:08 bennyrowland

OSError: [WinError -2147319779] Library not registered

It is probably required registering TestComServerLib class in the registry.

junkmd avatar Aug 05 '22 12:08 junkmd

Yep, I figured something like that would probably be the case - I have some ideas about trying to fake that but that will have to wait for another time. It is the other errors that I have been looking at. On closer inspection, the problems seem to relate to creating a numpy array of VARIANT dtype in _ndarray_to_variant_array() in safearray.py. The function creates a list of individual variants from the source numpy array, then assigns them to the empty VARIANT array. This prompts a check of the existing dtype of each VARIANT as they go through, but some of the possible types (including VARIANT_BOOL and c_wchar_p) do not have numpy compatible dtypes, so numpy throws an exception when trying to work out the dtype (because this is a union dtype, it doesn't matter whether the contents of the VARIANT has a known dtype, only that some of the possible dtypes are not valid). I assume it needs to know the dtype so it can work out if it needs to do some kind of cast to go in to the target ndarray. It is not at all clear to me how to solve this problem, perhaps we need a npVARIANT class which only contains numpy compatible dtypes? Also, presumably these tests did pass at some point, what has changed that broke them, and was it in comtypes or numpy (or even ctypes)? Any thoughts welcome.

bennyrowland avatar Aug 05 '22 13:08 bennyrowland

The numpy code in comtypes got a lot of attention in February. It might be worth checking if one of those changes broke the tests.

FWIW, "<P" highlighted in test_VT_UNKNOWN_multi_ndarray should never have worked AFAICT. So, whatever is deciding the typecode is "<P" is probably to blame for that one. I suspect the changing of ctypeslib._typecodes dictionary order in #239 here.

cfarrow avatar Aug 05 '22 13:08 cfarrow

Well, that was my last contribution, so it could well be at fault :-). I think that the type code "<P" is probably related to the repr being <POINTER(IUNKNOWN)> but it could be unrelated. I will investigate further.

bennyrowland avatar Aug 05 '22 15:08 bennyrowland

@cfarrow, so I have dug a little deeper into the "<P" typecode issue and it is definitely not related to #239, I rolled back before that commit and still see the issue. The problem seems to be that ctypes POINTER instances appear to support the PEP3118 Buffer Protocol, but actually are non-compliant, so for example if we run:

import comtypes
from comtypes.typeinfo import CreateTypeLib
punk = CreateTypeLib("spam").QueryInterface(comtypes.IUnknown)
# punk is now <POINTER(IUnknown)>
mv = memoryview(punk)  # this succeeds but probably shouldn't
mv.format  # '<P'

import numpy as np
arr = np.asarray(punk)  # this errors with the "Unknown PEP 3118 data type specifier 'P'" message

So numpy, when trying to create an array from punk, thinks it is an array of bytes that should be pulled in and immediately asks it for the format of each data element so it can work out the dtype, then fails because the format is "<P" which is not a valid format for PEP3118. I suspect that the correct format might need to be "O", this is what a memoryview of an ndarray with object dtype has as its format, I guess any POINTER that is not to a known type (like c_int or c_float) should probably be "O".

So I guess this issue might have to be raised with ctypes?

bennyrowland avatar Aug 05 '22 21:08 bennyrowland

Guys I agree that numpy related changes could be a major version update to 1.2.0. So it will shift the end of Py2.7 support to 1.3.0 next year.

vasily-v-ryabov avatar Aug 06 '22 06:08 vasily-v-ryabov

Ok, a little bit more information: it looks like this came in with np 1.15.0 which changed the way the PEP3118 buffer protocol was used to create arrays, see https://github.com/numpy/numpy/issues/11150. I think fixing this issue is way outside of our scope, creating numpy arrays from safearrays of ints, floats, bools etc. is all fine, it is only creating them from c_void_p that creates the problem, which is an unlikely use case in practice. I suggest we put this test on the skip list with an explanation - anyone that needs it fixing is likely to have a better idea than I do of how to do it.

The other issue, regarding VARIANT_BOOL, seems more like a comtypes level problem. I will open a new issue to discuss solving that.

I have implemented a simple fix for npsupport being opt-in, basically if you don't call enable_numpy_interop() then it is the same as if numpy is not in the environment. All tests that did pass still pass this way, and I have one more test checking the error message is raised if you try and do things before enabling the interop, but there probably need to be more tests included, and particularly I have done nothing about the case of passing in ndarrays as parameters if you haven't enabled npsupport, so they will currently fail with an unhelpful error message about failing to create the variant.

I think we can reasonably identify ndarrays by their __array_interface__ attribute, and use that to raise a ValueError("Looks like an ndarray but np interop not enabled") kind of thing, but there is also the possibility of trying to use the Buffer Protocol in comtypes directly which might make it just work without even needing numpy there at all (at least for standard dtypes). Any thoughts?

I will put together a PR soon so you can comment directly on the code, one thing that I did wonder about is whether we want to look at having multiple test environments (perhaps with tox) so that we can have an env with numpy installed and one without, with tests expecting different results in the different envs. We can simulate that to a certain extent by calling or not calling enable_numpy_interop() in test setup, but there are things that could go wrong with that. I know that the test system has undergone a lot of changes recently, have any decisions about things like that been reached?

bennyrowland avatar Aug 08 '22 13:08 bennyrowland

Nice diagnosis.

Regarding numpy vs buffers, I think we'd want to support buffers in the long run, but can get there through updating the numpy support.

I think we'll want multiple environments for testing eventually. If the plan is to generalize from numpy to buffers, however, it may not be worth the short-term overhead to do this solely for numpy support.

cfarrow avatar Aug 08 '22 14:08 cfarrow

I think we will still need numpy support for the safearray_as_ndarray direction at the very least, as that has to have numpy available to pass the buffer into, and at the moment when receiving a SAFEARRAY is to either convert straight to numpy ndarray or go to a list or other element by element conversion to Python. I suppose we could introduce a safearray_as_buffer context manager instead, then the user can pass the buffer to numpy (or array or some other thing if they want), that might let us get out of any dependency on numpy in comtypes. There might be a few issues with certain types like datetimes though.

bennyrowland avatar Aug 08 '22 14:08 bennyrowland

True. If we want to test for unintended interactions, multiple test environments will be needed.

cfarrow avatar Aug 08 '22 22:08 cfarrow

@bennyrowland @cfarrow @vasily-v-ryabov

Since #337 and #343 have been merged, I will close this issue as resolved.

If the issue remains, please re-open.

junkmd avatar Dec 24 '22 05:12 junkmd