cmd2
cmd2 copied to clipboard
transcripts do not catch errors
Quack,
I'm using cmd2 2.3.3 on Python 3.9.9 on Debian unstable.
I am trying to test my application using transcripts. Depending on the gravity of the situation it uses pwarning()
or perror()
to alert the user but these calls are not caught by transcript and looking at the code of _generate_transcript
is seems only stdout is captured.
I believe testing application checks are useful and in most test framework you can catch exceptions; would you consider capturing stderr too in the transcript?
Regards. \_o<
Possible this could be resolved by PR #1208 Also see issue #1207
@tleonhardt @anselor Can you think of a reason why capturing stderr would be problematic?
@kmvanbrunt The transcripts feature is an area of cmd2 I haven't really touched. At a high level I see no reason why it would be a problem.
@kmvanbrunt I would be happy to see a more robust solution, but currently cmd2 transcripts do not actually capture stdout, they only capture the output of self.poutput()
.
In other words, if you have two commands like:
def do_print(self, statement):
"""print something"""
print("printing something")
def do_poutput(self, statement):
"""poutput something"""
self.poutput("poutput something")
and run you see this:
(Cmd) print
printing something
(Cmd) poutput
poutput something
but if you create a transcript with:
(Cmd) history -t transcript.txt -- -2:
then the transcript.txt looks like:
(Cmd) print
(Cmd) poutput
poutput something
so the print output is missing.
If you can figure out how to actually capture stdout, that would be great. But if not, maybe capturing self.perror()
as in PR #1208 is at least sufficient so transcripts fail if there are exceptions raised?
@crotwell Transcripts capture everything written to self.stdout
, which is not limited to self.output()
calls.
Transcripts also already capture self.stderr
via a cmd2.utils.StdSim
and unittest.TextTestRunner
. That data is printed to sys.stderr
when a test is successful. Upon failure, the unittest
class also writes to sys.stderr
data describing why the test failed.
See this code here: https://github.com/python-cmd2/cmd2/blob/master/cmd2/cmd2.py#L4998
Your original PR probably has the simplest approach for filtering user-printed messages from other sys.stderr
data, but should these cases be considered failed tests if you are expecting the error text?
@duck-rh Can you clarify why you want sys.stderr
captured? Are you using transcripts like unit tests to exercise good and bad code paths?
If we start evaluating both streams in a transcript test, what happens when a test case writes to both streams? How will the transcript differentiate what text in a transcript belongs to self.stdout
and what text belongs to sys.stderr
? I believe the approach by @crotwell addresses this concern by storing user-printed error messages in another class member, but it limits transcripts to treating all error messages as failed tests.
@tleonhardt Thoughts?
OK, I see the clarification. You said capturing stderr
, but sounds like you actually meant capturing self.stderr
.
I do not care about stderr being in the transcript, intermingling stdout and stderr is probably hard and error prone. And I guess there could be cases where self.stderr was not an actual failure, so perror
is not the right way to go. As I said in issue #1207, what I actually care about is that a transcript test should not "pass" if there is a failure situation in cmd2. In particular, a test of a transcript that contains a command that does not actually exist should not report "ok". If it was easier or less invasive to fail on exceptions or other error handling instead of stderr, that is probably better.
Basically, I went with the simple perror()
solution just because onecmd()
calls default()
which just sends the not a recognized command
message directly to perror here, ie no exception raised:
https://github.com/python-cmd2/cmd2/blob/5ce3a64e41258b6a694ad45bb1c604be53a1e974/cmd2/cmd2.py#L2871
Currently there is an odd mixture of sometimes "bad things" are exceptions and sometimes they are just perror()
prints, making it a lot harder for the transcript tests to know when something failed. Perhaps onecmd
and default
should not capture the unknown command failure situation, instead raise an exception that can be caught at a higher level, such as by the transcript tests. This may also require refactoring onecmd_plus_hooks
to separate running a command from the error handling. For a user typing commands, perror()
is the right error handling, but for running a transcript as a test it might not be.
cmd2 does not have a self.stderr
member like it does self.stdout
and self.stdin
which are inherited from Python's cmd library. We capture sys.stderr
during transcripts, command redirection/piping, and pyscripts.
The question I have is about the purpose of transcripts. I've not really used them nor have I contributed much to the transcript code, so my thoughts here might be incorrect.
It seems like transcripts are about capturing expected output. When you generate them using history -t
, it writes the output of a command line to a transcript file. If I type a command that does not exist, then I expect nothing in self.stdout
and a specific error string in sys.stderr
. If that happens, then I would call it a success because actual output matched expected output. If anything else happens which does not match the transcript, then I would call it a failure.
That purpose for transcripts seems correct to me.
But it is unclear to me if a transcript of something that writes to both stdout and stderr would be usable as a test in that how stdout and stderr are intermingled could be non-deterministic. It might be worth trying, but I don't know enough about python internals to know if you might effectively create a race condition between stdout and stderr if you try to combine them, especially in cases where there was a call to the shell.
@kmvanbrunt yes, that's exactly it, I'm running unit tests. I don't see any other way to test the UI part of my application. I create a complete test environment and I want to make sure the flow of a command ends up with the proper result, whether it's a result or an error. I also have scenarios with a suite of commands having a specific outcome. You can have a look at what it looks like here: https://vcs-git.duckcorp.org/duckcorp/ldapwalker (branch test
).
@duck-rh Have you looked at the external test plug-in? https://github.com/python-cmd2/cmd2/tree/master/plugins/ext_test
@anselor thanks for pointing this out I must have missed it. I made a quick test and that seem to work fine.
I noticed that plugins are not included in the distribution and I had to copy the module in my sources to make it work. Would it be possible to distribute them?
@duck-rh It's distributed as a separate wheel you can import for your tests. https://pypi.org/project/cmd2-ext-test/
PR #1210 I think addresses part of this, at least my concern about a transcript with a non-existant command returning "ok". Does not take care of capturing stderr when creating a transcript with history, not sure how important that is. Basically I borrowed code from cmd2-ext-test to use pybridge to grab stderr and concat it with stdout for the result. Simple concat might not be the right way to go, but at least something is there.
Anyway, hope this helps...