comtypes
comtypes copied to clipboard
Returning `out` arguments breaks when they must be preallocated
The standard treatment of argument directions is as follows:
- An
'in'argument must be provided and is not returned. - An
'out'argument is returned by the function and must not be provided. - An
'in', 'out'argument must be provided and is returned.
This model breaks down when a parameter marked as 'out' must be preallocated with some defined amount of memory. This happens for example in ISequentialStream::Read (or RemoteRead in my case, which has the same signature). The parameters are given by
(['out'], POINTER(c_ubyte), 'pv'),
(['in'], c_ulong, 'cb'),
(['out'], POINTER(c_ulong), 'pcbRead')
This call only works if pv is pre-allocated with cb bytes, which cannot be done in the current model.
I resorted to calling the base method ISequentialStream._ISequentialStream__com_RemoteRead with three arguments. My question is whether I am overlooking a more elegant solution. If that is not the case, I was wondering if there is a better way to treat this problem.
One other option is to change the generated header and set the first parameter to be ['in', 'out'], but this breaks if one re-generates the file and is also technically incorrect.
Please write a short code snippets to reproduce the condition.
- In the case of calling the
ISequentialStream.RemoteRead - In the case of calling the
ISequentialStream._ISequentialStream__com_RemoteRead
I quickly put together a file based on your test you provided here:
https://github.com/jonschz/comtypes/blob/967b397981a0a0fc71bfc45430ca7e73c8b05de4/comtypes/test/test_portabledevice.py#L116-L121
Here I use the workaround described above:
https://github.com/jonschz/comtypes/blob/967b397981a0a0fc71bfc45430ca7e73c8b05de4/comtypes/test/test_portabledevice.py#L122-L127
As before, a portable device must be connected for the test to work. This test won't modify anything on the device, but it may download personal data to the host computer.
(not sure how one can get the code snippets to render here)
If you change the USERNAME of https://github.com/{USERNAME}/comtypes/blob/... to the repository host, it will be rendered.
https://github.com/enthought/comtypes/blob/967b397981a0a0fc71bfc45430ca7e73c8b05de4/comtypes/test/test_portabledevice.py#L116-L121 renders …
https://github.com/enthought/comtypes/blob/967b397981a0a0fc71bfc45430ca7e73c8b05de4/comtypes/test/test_portabledevice.py#L116-L121
And https://github.com/enthought/comtypes/blob/967b397981a0a0fc71bfc45430ca7e73c8b05de4/comtypes/test/test_portabledevice.py#L122-L127 renders …
https://github.com/enthought/comtypes/blob/967b397981a0a0fc71bfc45430ca7e73c8b05de4/comtypes/test/test_portabledevice.py#L122-L127
The MemberSpec for the GetStream method of IPortableDeviceResources is made as follows.
# in `...\comtypes\gen\_1F001332_1A57_4934_BE31_AFFC99F4EE0A_0_1_0.py`
IPortableDeviceResources._methods_ = [
...
COMMETHOD(
[],
HRESULT,
'GetStream',
(['in'], WSTRING, 'pszObjectID'),
(['in'], POINTER(_tagpropertykey), 'key'),
(['in'], c_ulong, 'dwMode'),
(['in', 'out'], POINTER(c_ulong), 'pdwOptimalBufferSize'),
(['out'], POINTER(POINTER(IStream)), 'ppStream')
),
...
]
Then, I have a few questions about some of your code snippets.
https://github.com/enthought/comtypes/blob/967b397981a0a0fc71bfc45430ca7e73c8b05de4/comtypes/test/test_portabledevice.py#L106-L114
- Why do you overwrite a null pointer
IStreamwith the return value ofresources.GetStreamimmediately after assigning it topFileStream?(L106-L107) - Why are you taking the pointer contents from the
pFileStream?(L114) As with the return value ofCreateObject, I think that you might be able to access the method in runtime without extracting the pointer contents (static type analysis notwithstanding).
Why do you overwrite a null pointer IStream with the return value of resources.GetStream immediately after assigning it to pFileStream?(L106-L107) Why are you taking the pointer contents from the pFileStream?(L114)
Both of those are relics from someone else's earlier code that I forgot to remove, my bad. The updated code is here, and the behaviour has not changed.
You should try to redefine ISequencialStream and ISequence on test_portabledevice like this.
(import GUID, c_longlong, etc. as appropriate)
class ISequentialStream(comtypes.IUnknown):
_case_insensitive_ = True
_iid_ = GUID('{0C733A30-2A1C-11CE-ADE5-00AA0044773D}')
_idlflags_ = []
def RemoteRead(self, size: int) -> tuple["Array[c_ubyte]", int]:
pv = (c_ubyte * size)()
pcb_read = pointer(c_ulong(0))
self.__com_RemoteRead(pv, c_ulong(size), pcb_read)
return pv, pcb_read.contents.value # or `list(pv), pcb_read.contents.value`, or `bytes(pv), pcb_read.contents.value`
ISequentialStream._methods_ = [
COMMETHOD(
[],
HRESULT,
'RemoteRead',
(['out'], POINTER(c_ubyte), 'pv'),
(['in'], c_ulong, 'cb'),
(['out'], POINTER(c_ulong), 'pcbRead')
),
COMMETHOD(
[],
HRESULT,
'RemoteWrite',
(['in'], POINTER(c_ubyte), 'pv'),
(['in'], c_ulong, 'cb'),
(['out'], POINTER(c_ulong), 'pcbWritten')
),
]
_LARGE_INTEGER = c_longlong
_ULARGE_INTEGER = c_ulonglong
class tagSTATSTG(Structure):
pass
class IStream(ISequentialStream):
_case_insensitive_ = True
_iid_ = GUID('{0000000C-0000-0000-C000-000000000046}')
_idlflags_ = []
IStream._methods_ = [
COMMETHOD(
[],
HRESULT,
'RemoteSeek',
(['in'], _LARGE_INTEGER, 'dlibMove'),
(['in'], c_ulong, 'dwOrigin'),
(['out'], POINTER(_ULARGE_INTEGER), 'plibNewPosition')
),
COMMETHOD(
[],
HRESULT,
'SetSize',
(['in'], _ULARGE_INTEGER, 'libNewSize')
),
COMMETHOD(
[],
HRESULT,
'RemoteCopyTo',
(['in'], POINTER(IStream), 'pstm'),
(['in'], _ULARGE_INTEGER, 'cb'),
(['out'], POINTER(_ULARGE_INTEGER), 'pcbRead'),
(['out'], POINTER(_ULARGE_INTEGER), 'pcbWritten')
),
COMMETHOD(
[],
HRESULT,
'Commit',
(['in'], c_ulong, 'grfCommitFlags')
),
COMMETHOD([], HRESULT, 'Revert'),
COMMETHOD(
[],
HRESULT,
'LockRegion',
(['in'], _ULARGE_INTEGER, 'libOffset'),
(['in'], _ULARGE_INTEGER, 'cb'),
(['in'], c_ulong, 'dwLockType')
),
COMMETHOD(
[],
HRESULT,
'UnlockRegion',
(['in'], _ULARGE_INTEGER, 'libOffset'),
(['in'], _ULARGE_INTEGER, 'cb'),
(['in'], c_ulong, 'dwLockType')
),
COMMETHOD(
[],
HRESULT,
'Stat',
(['out'], POINTER(tagSTATSTG), 'pstatstg'),
(['in'], c_ulong, 'grfStatFlag')
),
COMMETHOD(
[],
HRESULT,
'Clone',
(['out'], POINTER(POINTER(IStream)), 'ppstm')
),
]
Then rewrite as follows;
https://github.com/enthought/comtypes/blob/1c8a9bd0564507d882bda52f07c60d56306ba08c/comtypes/test/test_portabledevice.py#L106-L112
optimalTransferSizeBytes, fileStream = resources.GetStream(
objectID,
WPD_RESOURCE_DEFAULT,
STGM_READ,
optimalTransferSizeBytes,
)
fileStream = fileStream.QueryInterface(IStream)
blockSize = optimalTransferSizeBytes.contents.value
https://github.com/enthought/comtypes/blob/1c8a9bd0564507d882bda52f07c60d56306ba08c/comtypes/test/test_portabledevice.py#L122-L125
_, data_read = fileStream.RemoteRead(blockSize)
print(f"Read data: {data_read}")
This way is WET and may not be very elegant because it redefines the classes. However, it can be avoid from the ugly situation of directly referencing a mangled private name.
edited: fix return value of RemoteRead
If we were going to fix these problems permanently(in other words, "fix as no need workarounds defined by users"), we need the agreements with the community.
The followings are my shallow thoughts.
-
Same as you suggested, adding a conditional branch to
codegeneratorortlbparserthatpvbecomes['in', 'out']seems to be a high impact. -
I assumed that the case if "generally interfaces" such as
IStreamis defined statically. If that happens, thecodegeneratorwill stop to define dynamically them, instead import them(seeknown_symbols) from statically modules. (This may also break backward compatibilities, but may be better than the aforementioned)
- I assumed that the case if "generally interfaces" such as
IStreamis defined statically. If that happens, thecodegeneratorwill stop to define dynamically them, instead import them(seeknown_symbols) from statically modules. (This may also break backward compatibilities, but may be better than the aforementioned)
With regard to the aforementioned, I have done a little experimentation that are adding the module with interfaces statically defined and they are imported instead of being defined dynamically.
https://github.com/enthought/comtypes/blob/cfb8ba1c15efcfd352806c671d4faa1f0b879a7b/comtypes/objidl.py#L1-L140
https://github.com/enthought/comtypes/blob/cfb8ba1c15efcfd352806c671d4faa1f0b879a7b/comtypes/client/_generate.py#L249-L260
However, when it comes to introducing such these into production code (as I have said many times), we must be very careful about backward compatibility.
I like the idea of statically importing these streams.
Edit: I was wrong, see the comment below ~Regarding your RemoteRead:~
def RemoteRead(self, size: int) -> tuple["Array[c_ubyte]", int]: pv = (c_ubyte * size)() pcb_read = pointer(c_ulong(0)) self.__com_RemoteRead(pv, c_ulong(size), pcb_read) return pv, pcb_read.contents.value # or `list(pv), pcb_read.contents.value`, or `bytes(pv), pcb_read.contents.value`
- ~It appears that
comtypesconvertssizeto ac_ulongeven though this function is not imported, resulting in exceptions. The following modification works:~ - I did not find any adverse effects from the frequent allocation and freeing of memory. Copying a 1GB file did not reveal a memory leak, and there is no difference in performance between the two versions.
Regarding RemoteWrite: We don't technically need this one since it does not have the same problem as RemoteRead, but it is still convenient. I am more worried about backwards compatibility here since there might be code out there using the generated RemoteWrite, while all working code out there must have a workaround for RemoteRead in one way or another.
def RemoteWrite(self, pv: bytes, cb: int) -> int: pv_ = (c_ubyte * len(pv)).from_buffer(bytearray(pv)) pcb_written = pointer(c_ulong(0)) self.__com_RemoteWrite(pv_, c_ulong(cb), pcb_written) # type: ignore return pcb_written.contents.value
This works as expected. The code from https://github.com/KasparNagu/PortableDevices uses ctypes.create_string_buffer() and ctypes.cast() to cast it to a ubyte-array, but I think I prefer your solution. We could also contemplate removing cb as a parameter and just go with len(pv). If a user only wants to write part of some buffer, they can simply slice it.
Thanks a lot!
After posting the sample code snippet, I still gave it some thought.
- Overriding the
RemoteWriteseems out of this issue scope. The rawRemoteWritemight be somewhat inconvenience to use, but there may be projects that have implementations that rely on it. - Likewise, setting
ReadandWriteto aliasRemoteReadandRemoteWritewould also be out of this issue scope. - The
RemoteReadis the only method which cannot be used in its raw form, so that truly need to override. - Making these interfaces
Pythonfriendly is probably necessary, and your suggested code would certainly help. However, to keep the scope of changes small, let's limit the scope of this issue to theRemoteReadonly for now.
To test this RemoteRead, what is the suitable way?
Your code is using the portabledevice api.
But there might be other com libraries that handle files in IStream.
I am not familiar with such libraries, so I am hoping you or other community members can suggest a way that is not environment dependent.
Edited: Fixed the part where I meant to RemoteWrite but I had written RemoteRead.
Hi,
I have put together a unit test for IStream, see here. While I could not easily find a different importable library that uses IStream, CreateStreamOnHGlobal works just fine with the one from portable devices.
Also, my apologies - your suggested code for RemoteRead works just fine, I made a mistake in applying it 🤦♂️
Your suggestions sound reasonable, though I suspect that other interfaces which implement Read instead of RemoteRead may run into the same issue. I am not sure which conditions distinguish the use of RemoteRead vs. Read.
As far as I am concerned, statically importing ISequentialStream with a fixed RemoteRead would work for me.
Your contribution helps us so much.
I found the SO that might help you about Read and RemoteRead.
https://stackoverflow.com/questions/19820999/what-are-the-remoteread-and-remotewrite-members-of-isequentialstream
Your code is a great use case for CreateStreamOnHGlobal in comtypes.
I am trying to figure out how to take this into the production code so that we can continue to operate well. Please wait a while.
Any opinions would be appreciated.
As you say, the only thing that needs to be statically defined in the scope of this issue is ISequentialStream.
In addition to overriding RemoteRead, adding a type-checking-only type hint to RemoteWrite would be sufficient for modern Python code.
objidl.pyhttps://github.com/enthought/comtypes/blob/41970e7933eec80952aadac4f5b57b4232a8d79f/comtypes/objidl.py#L1-L59test_istream.pyhttps://github.com/enthought/comtypes/blob/41970e7933eec80952aadac4f5b57b4232a8d79f/comtypes/test/test_istream.py#L1-L39- add
comtypes.objidltoclient._generate._get_known_symbols - add
test_symbols_in_comtypes_objidlintest_client.py
Is this code okay for you? If you agree, I would like to hear from other maintainers on this code.
Also keep in mind, the drop_py2 has already many changes at this time, so it may likely be after the drop_py2 plan when the PR you would submit is merged.
The code looks fine to me, thank you! I can also confirm the correct behaviour of the test (passes with the modification in _generate, fails without it).
so it may likely be after the
drop_py2plan when the PR you would submit is merged.
In that case, would it make more sense to create the pull request at a later point in time?
@vasily-v-ryabov
Do you think these changes are reasonable to resolve this issue?
Even if these changes are acceptable for the project, I think it is out of scope of the drop_py2, such as #475.
How do you think this should go?
@jonschz
Hi,
The codegenerator was determining whether the item is one of the known symbols based solely on its name.
However, with #529 and #534, the criteria for determining whether a COM interface is a known symbol now include not only the name but also the iid.
With those change, I think that ISequentialStream can be included in the __known_symbols__ of objidl.py without the risk of it being mistaken for another interface with the same name.
If you are still interested in this issue, feel free to PR.