asyncssh icon indicating copy to clipboard operation
asyncssh copied to clipboard

readutil() blocked

Open optomind opened this issue 2 years ago • 1 comments

For an regex, self.WAITFOR = r'.[>#]\s$' stdout.readuntil(self.WAITFOR) is being blocked and timed out.

It happens in the version 2.12.0, while it works fine in the previous version 2.11.0.

optomind avatar Sep 01 '22 04:09 optomind

Unfortunately, regex patterns were not supposed to be supported in readuntil(). See #486 where this was reported, causing some separators which matched regex special characters to no longer correctly match. This broke in 2.8.1, due to an re.escape() which was inadvertently removed when some type annotation changes were made. The fact that a regex pattern actually matched in some cases was unintentional, and it actually wouldn't work properly in all cases. The intended behavior is to only allow a single literal string or a sequence of literal strings, which was the case prior to 2.8.1 and is the case again in 2.12.0.

Properly supporting an arbitrary regex would actually be quite difficult, as there's no way to know the maximum number of characters which might need to be read before a match might be found. With a list of string literals to match against, this max can be determined (from the longest literal), and an appropriate amount of data can be made available before attempting a match.

For the specific case you have here, I think you should be able to pass in a list of strings such as ('#', '>', 's') as your separator argument. It's not quite the same as your regex, but I'm not sure why you're starting with a '.' in the current pattern, which can match any character. That shouldn't really make a difference with readuntil(). Also, the '$' is unnecessary there.

ronf avatar Sep 01 '22 05:09 ronf

Hi Ron, in our case, if we use just those strings, we terminate anytime any one of those characters is present in the output which is not uncommon. now regexs don't necessarily prevent this from happening, but explicitly adding the $ for regex has reduced the chance to an extremely low probability(not seen it once in millions of connections so far across different networks). So if not a generic regex, some other option to handle this scenario i.e. basically look for a prompt sign

Thanks again for your gift of this most excellent package, Dinesh

ddutt avatar Feb 15 '23 14:02 ddutt

The readuntil() function is explicitly designed to successfully match on separators even when they cross a read boundary, because these boundaries are unpredictable. I'd prefer not to offer an option to look at read boundaries in readuntil(), as it is inherently unreliable.

That said, if you do want to only match on things at the end of a read() boundary (and I can see why it would help reduce false hits in your case), this is easy to do with the plain read() function. For instance, you could do something like:

    data = await stdout.read(16384)
    if data and data[-1] in '>#':
        # Handle last block of data
    else:
        # Handle intermediate data

You may actually want to break up that if statement further to handle data being empty as its own case, as that would indicate that the channel closed. After that, you can check if the last character of data is '>' or '#'. If you know which of those two to expect, you could also compare only against that one character, which might be slightly more efficient and could further reduce false positives.

ronf avatar Feb 16 '23 02:02 ronf

Thanks for the detailed info Ron.

ddutt avatar Feb 16 '23 13:02 ddutt