comtypes icon indicating copy to clipboard operation
comtypes copied to clipboard

Returning `out` arguments breaks when they must be preallocated

Open jonschz opened this issue 1 year ago • 16 comments

The standard treatment of argument directions is as follows:

  • An 'in' argument must be provided and is not returned.
  • An 'out' argument is returned by the function and must not be provided.
  • An 'in', 'out' argument must be provided and is returned.

This model breaks down when a parameter marked as 'out' must be preallocated with some defined amount of memory. This happens for example in ISequentialStream::Read (or RemoteRead in my case, which has the same signature). The parameters are given by

        (['out'], POINTER(c_ubyte), 'pv'),
        (['in'], c_ulong, 'cb'),
        (['out'], POINTER(c_ulong), 'pcbRead')

This call only works if pv is pre-allocated with cb bytes, which cannot be done in the current model.

I resorted to calling the base method ISequentialStream._ISequentialStream__com_RemoteRead with three arguments. My question is whether I am overlooking a more elegant solution. If that is not the case, I was wondering if there is a better way to treat this problem.

One other option is to change the generated header and set the first parameter to be ['in', 'out'], but this breaks if one re-generates the file and is also technically incorrect.

jonschz avatar Jan 17 '23 08:01 jonschz

Please write a short code snippets to reproduce the condition.

  • In the case of calling the ISequentialStream.RemoteRead
  • In the case of calling the ISequentialStream._ISequentialStream__com_RemoteRead

junkmd avatar Jan 18 '23 10:01 junkmd

I quickly put together a file based on your test you provided here:

https://github.com/jonschz/comtypes/blob/967b397981a0a0fc71bfc45430ca7e73c8b05de4/comtypes/test/test_portabledevice.py#L116-L121

Here I use the workaround described above:

https://github.com/jonschz/comtypes/blob/967b397981a0a0fc71bfc45430ca7e73c8b05de4/comtypes/test/test_portabledevice.py#L122-L127

As before, a portable device must be connected for the test to work. This test won't modify anything on the device, but it may download personal data to the host computer.

(not sure how one can get the code snippets to render here)

jonschz avatar Jan 18 '23 15:01 jonschz

If you change the USERNAME of https://github.com/{USERNAME}/comtypes/blob/... to the repository host, it will be rendered.

https://github.com/enthought/comtypes/blob/967b397981a0a0fc71bfc45430ca7e73c8b05de4/comtypes/test/test_portabledevice.py#L116-L121 renders …

https://github.com/enthought/comtypes/blob/967b397981a0a0fc71bfc45430ca7e73c8b05de4/comtypes/test/test_portabledevice.py#L116-L121

And https://github.com/enthought/comtypes/blob/967b397981a0a0fc71bfc45430ca7e73c8b05de4/comtypes/test/test_portabledevice.py#L122-L127 renders …

https://github.com/enthought/comtypes/blob/967b397981a0a0fc71bfc45430ca7e73c8b05de4/comtypes/test/test_portabledevice.py#L122-L127

junkmd avatar Jan 18 '23 21:01 junkmd

The MemberSpec for the GetStream method of IPortableDeviceResources is made as follows.

# in `...\comtypes\gen\_1F001332_1A57_4934_BE31_AFFC99F4EE0A_0_1_0.py`
IPortableDeviceResources._methods_ = [
    ...
    COMMETHOD(
        [],
        HRESULT,
        'GetStream',
        (['in'], WSTRING, 'pszObjectID'),
        (['in'], POINTER(_tagpropertykey), 'key'),
        (['in'], c_ulong, 'dwMode'),
        (['in', 'out'], POINTER(c_ulong), 'pdwOptimalBufferSize'),
        (['out'], POINTER(POINTER(IStream)), 'ppStream')
    ),
    ...
]

Then, I have a few questions about some of your code snippets.

https://github.com/enthought/comtypes/blob/967b397981a0a0fc71bfc45430ca7e73c8b05de4/comtypes/test/test_portabledevice.py#L106-L114

  • Why do you overwrite a null pointer IStream with the return value of resources.GetStream immediately after assigning it to pFileStream?(L106-L107)
  • Why are you taking the pointer contents from the pFileStream?(L114) As with the return value of CreateObject, I think that you might be able to access the method in runtime without extracting the pointer contents (static type analysis notwithstanding).

junkmd avatar Jan 21 '23 00:01 junkmd

Why do you overwrite a null pointer IStream with the return value of resources.GetStream immediately after assigning it to pFileStream?(L106-L107) Why are you taking the pointer contents from the pFileStream?(L114)

Both of those are relics from someone else's earlier code that I forgot to remove, my bad. The updated code is here, and the behaviour has not changed.

jonschz avatar Jan 21 '23 09:01 jonschz

You should try to redefine ISequencialStream and ISequence on test_portabledevice like this. (import GUID, c_longlong, etc. as appropriate)

class ISequentialStream(comtypes.IUnknown):

    _case_insensitive_ = True
    _iid_ = GUID('{0C733A30-2A1C-11CE-ADE5-00AA0044773D}')
    _idlflags_ = []

    def RemoteRead(self, size: int) -> tuple["Array[c_ubyte]", int]:
        pv = (c_ubyte * size)()
        pcb_read = pointer(c_ulong(0))
        self.__com_RemoteRead(pv, c_ulong(size), pcb_read)
        return pv, pcb_read.contents.value  # or `list(pv), pcb_read.contents.value`, or `bytes(pv), pcb_read.contents.value`


ISequentialStream._methods_ = [
    COMMETHOD(
        [],
        HRESULT,
        'RemoteRead',
        (['out'], POINTER(c_ubyte), 'pv'),
        (['in'], c_ulong, 'cb'),
        (['out'], POINTER(c_ulong), 'pcbRead')
    ),
    COMMETHOD(
        [],
        HRESULT,
        'RemoteWrite',
        (['in'], POINTER(c_ubyte), 'pv'),
        (['in'], c_ulong, 'cb'),
        (['out'], POINTER(c_ulong), 'pcbWritten')
    ),
]

_LARGE_INTEGER = c_longlong
_ULARGE_INTEGER = c_ulonglong


class tagSTATSTG(Structure):
    pass


class IStream(ISequentialStream):

    _case_insensitive_ = True
    _iid_ = GUID('{0000000C-0000-0000-C000-000000000046}')
    _idlflags_ = []


IStream._methods_ = [
    COMMETHOD(
        [],
        HRESULT,
        'RemoteSeek',
        (['in'], _LARGE_INTEGER, 'dlibMove'),
        (['in'], c_ulong, 'dwOrigin'),
        (['out'], POINTER(_ULARGE_INTEGER), 'plibNewPosition')
    ),
    COMMETHOD(
        [],
        HRESULT,
        'SetSize',
        (['in'], _ULARGE_INTEGER, 'libNewSize')
    ),
    COMMETHOD(
        [],
        HRESULT,
        'RemoteCopyTo',
        (['in'], POINTER(IStream), 'pstm'),
        (['in'], _ULARGE_INTEGER, 'cb'),
        (['out'], POINTER(_ULARGE_INTEGER), 'pcbRead'),
        (['out'], POINTER(_ULARGE_INTEGER), 'pcbWritten')
    ),
    COMMETHOD(
        [],
        HRESULT,
        'Commit',
        (['in'], c_ulong, 'grfCommitFlags')
    ),
    COMMETHOD([], HRESULT, 'Revert'),
    COMMETHOD(
        [],
        HRESULT,
        'LockRegion',
        (['in'], _ULARGE_INTEGER, 'libOffset'),
        (['in'], _ULARGE_INTEGER, 'cb'),
        (['in'], c_ulong, 'dwLockType')
    ),
    COMMETHOD(
        [],
        HRESULT,
        'UnlockRegion',
        (['in'], _ULARGE_INTEGER, 'libOffset'),
        (['in'], _ULARGE_INTEGER, 'cb'),
        (['in'], c_ulong, 'dwLockType')
    ),
    COMMETHOD(
        [],
        HRESULT,
        'Stat',
        (['out'], POINTER(tagSTATSTG), 'pstatstg'),
        (['in'], c_ulong, 'grfStatFlag')
    ),
    COMMETHOD(
        [],
        HRESULT,
        'Clone',
        (['out'], POINTER(POINTER(IStream)), 'ppstm')
    ),
]

Then rewrite as follows;

https://github.com/enthought/comtypes/blob/1c8a9bd0564507d882bda52f07c60d56306ba08c/comtypes/test/test_portabledevice.py#L106-L112


                optimalTransferSizeBytes, fileStream = resources.GetStream(
                    objectID,
                    WPD_RESOURCE_DEFAULT,
                    STGM_READ,
                    optimalTransferSizeBytes,
                )
                fileStream = fileStream.QueryInterface(IStream)
                blockSize = optimalTransferSizeBytes.contents.value

https://github.com/enthought/comtypes/blob/1c8a9bd0564507d882bda52f07c60d56306ba08c/comtypes/test/test_portabledevice.py#L122-L125

                    _, data_read = fileStream.RemoteRead(blockSize)
                    print(f"Read data: {data_read}")

This way is WET and may not be very elegant because it redefines the classes. However, it can be avoid from the ugly situation of directly referencing a mangled private name.

edited: fix return value of RemoteRead

junkmd avatar Jan 21 '23 14:01 junkmd

If we were going to fix these problems permanently(in other words, "fix as no need workarounds defined by users"), we need the agreements with the community.

The followings are my shallow thoughts.

  • Same as you suggested, adding a conditional branch to codegenerator or tlbparser that pv becomes ['in', 'out'] seems to be a high impact.

  • I assumed that the case if "generally interfaces" such as IStream is defined statically. If that happens, the codegenerator will stop to define dynamically them, instead import them(see known_symbols) from statically modules. (This may also break backward compatibilities, but may be better than the aforementioned)

junkmd avatar Jan 21 '23 14:01 junkmd

  • I assumed that the case if "generally interfaces" such as IStream is defined statically. If that happens, the codegenerator will stop to define dynamically them, instead import them(see known_symbols) from statically modules. (This may also break backward compatibilities, but may be better than the aforementioned)

With regard to the aforementioned, I have done a little experimentation that are adding the module with interfaces statically defined and they are imported instead of being defined dynamically.

https://github.com/enthought/comtypes/blob/cfb8ba1c15efcfd352806c671d4faa1f0b879a7b/comtypes/objidl.py#L1-L140

https://github.com/enthought/comtypes/blob/cfb8ba1c15efcfd352806c671d4faa1f0b879a7b/comtypes/client/_generate.py#L249-L260

However, when it comes to introducing such these into production code (as I have said many times), we must be very careful about backward compatibility.

junkmd avatar Jan 22 '23 04:01 junkmd

I like the idea of statically importing these streams.

Edit: I was wrong, see the comment below ~Regarding your RemoteRead:~

def RemoteRead(self, size: int) -> tuple["Array[c_ubyte]", int]:
       pv = (c_ubyte * size)()
       pcb_read = pointer(c_ulong(0))
       self.__com_RemoteRead(pv, c_ulong(size), pcb_read)
       return pv, pcb_read.contents.value  # or `list(pv), pcb_read.contents.value`, or `bytes(pv), pcb_read.contents.value`
  • ~It appears that comtypes converts size to a c_ulong even though this function is not imported, resulting in exceptions. The following modification works:~
  • I did not find any adverse effects from the frequent allocation and freeing of memory. Copying a 1GB file did not reveal a memory leak, and there is no difference in performance between the two versions.

Regarding RemoteWrite: We don't technically need this one since it does not have the same problem as RemoteRead, but it is still convenient. I am more worried about backwards compatibility here since there might be code out there using the generated RemoteWrite, while all working code out there must have a workaround for RemoteRead in one way or another.

   def RemoteWrite(self, pv: bytes, cb: int) -> int:
        pv_ = (c_ubyte * len(pv)).from_buffer(bytearray(pv))
        pcb_written = pointer(c_ulong(0))
        self.__com_RemoteWrite(pv_, c_ulong(cb), pcb_written)  # type: ignore
        return pcb_written.contents.value

This works as expected. The code from https://github.com/KasparNagu/PortableDevices uses ctypes.create_string_buffer() and ctypes.cast() to cast it to a ubyte-array, but I think I prefer your solution. We could also contemplate removing cb as a parameter and just go with len(pv). If a user only wants to write part of some buffer, they can simply slice it.

jonschz avatar Jan 23 '23 08:01 jonschz

Thanks a lot!

After posting the sample code snippet, I still gave it some thought.

  • Overriding the RemoteWrite seems out of this issue scope. The raw RemoteWrite might be somewhat inconvenience to use, but there may be projects that have implementations that rely on it.
  • Likewise, setting Read and Write to alias RemoteRead and RemoteWrite would also be out of this issue scope.
  • The RemoteRead is the only method which cannot be used in its raw form, so that truly need to override.
  • Making these interfaces Python friendly is probably necessary, and your suggested code would certainly help. However, to keep the scope of changes small, let's limit the scope of this issue to the RemoteRead only for now.

To test this RemoteRead, what is the suitable way? Your code is using the portabledevice api. But there might be other com libraries that handle files in IStream. I am not familiar with such libraries, so I am hoping you or other community members can suggest a way that is not environment dependent.

Edited: Fixed the part where I meant to RemoteWrite but I had written RemoteRead.

junkmd avatar Jan 23 '23 13:01 junkmd

Hi,

I have put together a unit test for IStream, see here. While I could not easily find a different importable library that uses IStream, CreateStreamOnHGlobal works just fine with the one from portable devices.

Also, my apologies - your suggested code for RemoteRead works just fine, I made a mistake in applying it 🤦‍♂️

Your suggestions sound reasonable, though I suspect that other interfaces which implement Read instead of RemoteRead may run into the same issue. I am not sure which conditions distinguish the use of RemoteRead vs. Read.

As far as I am concerned, statically importing ISequentialStream with a fixed RemoteRead would work for me.

jonschz avatar Jan 24 '23 20:01 jonschz

Your contribution helps us so much.

I found the SO that might help you about Read and RemoteRead.

https://stackoverflow.com/questions/19820999/what-are-the-remoteread-and-remotewrite-members-of-isequentialstream

Your code is a great use case for CreateStreamOnHGlobal in comtypes.

I am trying to figure out how to take this into the production code so that we can continue to operate well. Please wait a while.

Any opinions would be appreciated.

junkmd avatar Jan 24 '23 22:01 junkmd

As you say, the only thing that needs to be statically defined in the scope of this issue is ISequentialStream. In addition to overriding RemoteRead, adding a type-checking-only type hint to RemoteWrite would be sufficient for modern Python code.

  • objidl.py https://github.com/enthought/comtypes/blob/41970e7933eec80952aadac4f5b57b4232a8d79f/comtypes/objidl.py#L1-L59
  • test_istream.py https://github.com/enthought/comtypes/blob/41970e7933eec80952aadac4f5b57b4232a8d79f/comtypes/test/test_istream.py#L1-L39
  • add comtypes.objidl to client._generate._get_known_symbols
  • add test_symbols_in_comtypes_objidl in test_client.py

Is this code okay for you? If you agree, I would like to hear from other maintainers on this code.

Also keep in mind, the drop_py2 has already many changes at this time, so it may likely be after the drop_py2 plan when the PR you would submit is merged.

junkmd avatar Jan 28 '23 03:01 junkmd

The code looks fine to me, thank you! I can also confirm the correct behaviour of the test (passes with the modification in _generate, fails without it).

so it may likely be after the drop_py2 plan when the PR you would submit is merged.

In that case, would it make more sense to create the pull request at a later point in time?

jonschz avatar Jan 28 '23 15:01 jonschz

@vasily-v-ryabov

Do you think these changes are reasonable to resolve this issue?

Even if these changes are acceptable for the project, I think it is out of scope of the drop_py2, such as #475.

How do you think this should go?

junkmd avatar Jan 29 '23 07:01 junkmd

@jonschz

Hi,

The codegenerator was determining whether the item is one of the known symbols based solely on its name.

However, with #529 and #534, the criteria for determining whether a COM interface is a known symbol now include not only the name but also the iid.

With those change, I think that ISequentialStream can be included in the __known_symbols__ of objidl.py without the risk of it being mistaken for another interface with the same name.

If you are still interested in this issue, feel free to PR.

junkmd avatar May 07 '24 10:05 junkmd