khmer icon indicating copy to clipboard operation
khmer copied to clipboard

Calling overloaded 'consume_*' methods from Cython

Open standage opened this issue 6 years ago • 2 comments

At the C++ level all consume_seqfile methods are overloaded to accept either a filename (string) or a parser as input. However, at the wrapper level the parser variants of each function are not exposed (with the exception of consume_seqfile_with_reads_parser).

I've been twiddling with this code tonight, and I cannot get Cython to call the string variant of each function, even though it's properly defined. For example, changing...

cdef FastxParserPtr parser = get_parser[CpFastxReader](_bstring(file_name))
deref(self.c_table).consume_seqfile[CpFastxReader](parser,
                                                   total_reads,
                                                   n_consumed)

...to...

deref(self.c_table).consume_seqfile[CpFastxReader](<const string>_bstring(file_name),
                                                   total_reads,
                                                   n_consumed)

...results in the following error.

khmer/_oxli/graphs.pyx:212:59: Cannot assign type 'const string' to 'shared_ptr[CpReadParser[CpFastxReader]]'

I see that all of the related functions first create parser objects, and then call the consume functions using the parser, rather than directly calling the consume function with the filename string. Is this deliberate? Are there known issues with calling overloaded functions?

Rather than creating a _with_reads_parser variant of each function, I'd suggest we handle both string and read parsers using a single function. If we can't get the overload calls working correctly, we may have to do dynamic type checking to see whether the user provided a string or a parser, and in the case of a string create a parser before invoking the C++ consume method.

standage avatar Sep 06 '17 06:09 standage

This is a known Cython bug -- I'm quite positive there's an active issue for it (don't think I bookmarked it though).

The current workaround is kinda ugly, but it gets the job done. It uses Cython's aliasing; see the Hashgraph PR here for an example: https://github.com/dib-lab/khmer/pull/1755/files#diff-70cfa46a4aff403c6b6633566ff9d2edR144

On the extension class side, we can just type it as an object and call isinstance(FastxReader). The .pxd wrapper will have to remain ugly until the bug is fixed.

camillescott avatar Sep 06 '17 07:09 camillescott

Thanks for the response, I'll fiddle around around a bit more.

standage avatar Sep 06 '17 16:09 standage