cmd2 transcripts do not catch errors

Quack,

I'm using cmd2 2.3.3 on Python 3.9.9 on Debian unstable.

I am trying to test my application using transcripts. Depending on the gravity of the situation it uses pwarning() or perror() to alert the user but these calls are not caught by transcript and looking at the code of _generate_transcript is seems only stdout is captured.

I believe testing application checks are useful and in most test framework you can catch exceptions; would you consider capturing stderr too in the transcript?

Regards. \_o<

Dec 22 '21 13:12 duck-rh

Possible this could be resolved by PR #1208 Also see issue #1207

Feb 24 '22 00:02 crotwell

@tleonhardt @anselor Can you think of a reason why capturing stderr would be problematic?

Feb 24 '22 16:02 kmvanbrunt

@kmvanbrunt The transcripts feature is an area of cmd2 I haven't really touched. At a high level I see no reason why it would be a problem.

Feb 24 '22 16:02 anselor

@kmvanbrunt I would be happy to see a more robust solution, but currently cmd2 transcripts do not actually capture stdout, they only capture the output of self.poutput().

In other words, if you have two commands like:

    def do_print(self, statement):
        """print something"""
        print("printing something")

    def do_poutput(self, statement):
        """poutput something"""
        self.poutput("poutput something")

and run you see this:

(Cmd) print
printing something
(Cmd) poutput
poutput something

but if you create a transcript with:

(Cmd) history -t transcript.txt -- -2:

then the transcript.txt looks like:

(Cmd) print
(Cmd) poutput
poutput something

so the print output is missing.

If you can figure out how to actually capture stdout, that would be great. But if not, maybe capturing self.perror() as in PR #1208 is at least sufficient so transcripts fail if there are exceptions raised?

Feb 28 '22 14:02 crotwell

@crotwell Transcripts capture everything written to self.stdout, which is not limited to self.output() calls.

Transcripts also already capture self.stderr via a cmd2.utils.StdSim and unittest.TextTestRunner. That data is printed to sys.stderr when a test is successful. Upon failure, the unittest class also writes to sys.stderr data describing why the test failed.

See this code here: https://github.com/python-cmd2/cmd2/blob/master/cmd2/cmd2.py#L4998

Your original PR probably has the simplest approach for filtering user-printed messages from other sys.stderr data, but should these cases be considered failed tests if you are expecting the error text?

@duck-rh Can you clarify why you want sys.stderr captured? Are you using transcripts like unit tests to exercise good and bad code paths?

If we start evaluating both streams in a transcript test, what happens when a test case writes to both streams? How will the transcript differentiate what text in a transcript belongs to self.stdout and what text belongs to sys.stderr? I believe the approach by @crotwell addresses this concern by storing user-printed error messages in another class member, but it limits transcripts to treating all error messages as failed tests.

@tleonhardt Thoughts?

Feb 28 '22 15:02 kmvanbrunt

OK, I see the clarification. You said capturing stderr, but sounds like you actually meant capturing self.stderr.

I do not care about stderr being in the transcript, intermingling stdout and stderr is probably hard and error prone. And I guess there could be cases where self.stderr was not an actual failure, so perror is not the right way to go. As I said in issue #1207, what I actually care about is that a transcript test should not "pass" if there is a failure situation in cmd2. In particular, a test of a transcript that contains a command that does not actually exist should not report "ok". If it was easier or less invasive to fail on exceptions or other error handling instead of stderr, that is probably better.

Basically, I went with the simple perror() solution just because onecmd() calls default() which just sends the not a recognized command message directly to perror here, ie no exception raised: https://github.com/python-cmd2/cmd2/blob/5ce3a64e41258b6a694ad45bb1c604be53a1e974/cmd2/cmd2.py#L2871

Currently there is an odd mixture of sometimes "bad things" are exceptions and sometimes they are just perror() prints, making it a lot harder for the transcript tests to know when something failed. Perhaps onecmd and default should not capture the unknown command failure situation, instead raise an exception that can be caught at a higher level, such as by the transcript tests. This may also require refactoring onecmd_plus_hooks to separate running a command from the error handling. For a user typing commands, perror() is the right error handling, but for running a transcript as a test it might not be.

Feb 28 '22 16:02 crotwell

cmd2 does not have a self.stderr member like it does self.stdout and self.stdin which are inherited from Python's cmd library. We capture sys.stderr during transcripts, command redirection/piping, and pyscripts.

The question I have is about the purpose of transcripts. I've not really used them nor have I contributed much to the transcript code, so my thoughts here might be incorrect.

It seems like transcripts are about capturing expected output. When you generate them using history -t, it writes the output of a command line to a transcript file. If I type a command that does not exist, then I expect nothing in self.stdout and a specific error string in sys.stderr. If that happens, then I would call it a success because actual output matched expected output. If anything else happens which does not match the transcript, then I would call it a failure.

Feb 28 '22 17:02 kmvanbrunt

That purpose for transcripts seems correct to me.

But it is unclear to me if a transcript of something that writes to both stdout and stderr would be usable as a test in that how stdout and stderr are intermingled could be non-deterministic. It might be worth trying, but I don't know enough about python internals to know if you might effectively create a race condition between stdout and stderr if you try to combine them, especially in cases where there was a call to the shell.

Feb 28 '22 17:02 crotwell

@kmvanbrunt yes, that's exactly it, I'm running unit tests. I don't see any other way to test the UI part of my application. I create a complete test environment and I want to make sure the flow of a command ends up with the proper result, whether it's a result or an error. I also have scenarios with a suite of commands having a specific outcome. You can have a look at what it looks like here: https://vcs-git.duckcorp.org/duckcorp/ldapwalker (branch test).

Mar 01 '22 04:03 duck-rh

@duck-rh Have you looked at the external test plug-in? https://github.com/python-cmd2/cmd2/tree/master/plugins/ext_test

Mar 01 '22 05:03 anselor

@anselor thanks for pointing this out I must have missed it. I made a quick test and that seem to work fine.

I noticed that plugins are not included in the distribution and I had to copy the module in my sources to make it work. Would it be possible to distribute them?

Mar 01 '22 09:03 duck-rh

@duck-rh It's distributed as a separate wheel you can import for your tests. https://pypi.org/project/cmd2-ext-test/

Mar 01 '22 14:03 anselor

PR #1210 I think addresses part of this, at least my concern about a transcript with a non-existant command returning "ok". Does not take care of capturing stderr when creating a transcript with history, not sure how important that is. Basically I borrowed code from cmd2-ext-test to use pybridge to grab stderr and concat it with stdout for the result. Simple concat might not be the right way to go, but at least something is there.

Anyway, hope this helps...

Mar 01 '22 16:03 crotwell

cmd2 cmd2 copied to clipboard

transcripts do not catch errors

cmd2
cmd2 copied to clipboard