common icon indicating copy to clipboard operation
common copied to clipboard

gherkin: Compiling empty scenarios

Open mattwynne opened this issue 6 years ago • 55 comments

Summary

Currently, the Gherkin pickles compiler removes Gherkin scenarios that don't have any steps, so they don't appear as test cases.

I understand why this was done - what's the point in running a test case that isn't going to test anything? - yet I think we're missing a thing here.

When I watch teams do BDD, they very often start out by creating just the headline or title of a scenario, and defer filling out the Given/When/Then steps until later. I'd like it if Cucumber were reporting that scenario's status as undefined. In order to do this, we'd need to pass a pickle out of the compiler.

Expected Behavior

Pickles compiler outputs pickles (with no steps) for scenarios with no steps.

Current Behavior

Pickles compiler does not output a pickle for scenarios with no steps.

Possible Solution

I'd love some pointers here. I've never worked on the pickles compiler, but if someone can show me where to start (and we're agreed on the idea), I'd have a go at this.

Context & Motivation

I'm thinking about how people like product owners engage with Cucumber, and that I'd guess they think of a scenario as having taken life as soon as they name it. It seems wrong to hide those scenarios from Cucumber.

mattwynne avatar Aug 18 '17 11:08 mattwynne

Consider there two feature files:

Feature:
  Background:
    Given foo

  Scenario: empty
Feature:
  Scenario: not empty
    Given foo

The first one should be undefined and the second should not. But if we send all pickles through, how would we distinguish between the two?

aslakhellesoy avatar Aug 18 '17 12:08 aslakhellesoy

Hi @mattwynne

The compiler is pretty simple, compiler.rb is less than 200 LOC.

It doesn't have unit tests, though - only approval tests (the .pickles.ndjson files).

I would start with these files:

  • Input: https://github.com/cucumber/cucumber/blob/master/gherkin/testdata/good/incomplete_scenario.feature
  • Expected output: https://github.com/cucumber/cucumber/blob/master/gherkin/testdata/good/incomplete_scenario.feature (currently empty - that't what we want to change)

Rather than changing the expected output by hand, I'd make the change you think is correct. Start with the Ruby implementation - you can do the other languages later. The fix is maybe to remove these lines:

  • https://github.com/cucumber/cucumber/blob/88ee9254838bd0ff77249665e7eccf1a1c588687/gherkin/ruby/lib/gherkin/pickles/compiler.rb#L27
  • https://github.com/cucumber/cucumber/blob/88ee9254838bd0ff77249665e7eccf1a1c588687/gherkin/ruby/lib/gherkin/pickles/compiler.rb#L48

Now, cd into cucumber/gherkin/ruby and run make. You should get an error because the generated pickles are different from the expected (empty) one.

Make deletes the generated file if the test fails (its presence indicates "test passed" and prevents make from running it next time). Disable that temporarily by commenting out .DELETE_ON_ERROR in the Makefile so that we can inspect the file after a failure.

Run make again and inspect the generated file, then remove it so the test will run next time.

jq acceptance/testdata/good/incomplete_scenario.feature.pickles.ndjson
rm acceptance/testdata/good/incomplete_scenario.feature.pickles.ndjson

Does it look correct? Do you need to make more changes to the code?

When you're happy, save it to the golden master:

mv acceptance/testdata/good/incomplete_scenario.feature.pickles.ndjson ../testdata/good/

After copying the new golden master over here, sync it to all the other gherkin implementations:

cd ../..
source scripts/functions.sh
rsync_files

When you run git status you'll see that the updated golden master has been updated in all the other language implementations too.

This will of course cause the other Gherkin implementations' tests to fail, so now it's time to go and fix those. It's easier than you think - the structure is the same so you can hack on languages you don't know!

I recommend fixing all implementations in the same branch/PR, one commit at a time. If you're stuck in a language, someone else will jump in and do it for you. Do try though.

aslakhellesoy avatar Aug 18 '17 12:08 aslakhellesoy

The first one should be undefined and the second should not. But if we send all pickles through, how would we distinguish between the two?

By producing an empty Pickle when the Scenario or Scenario Outline AST object has not steps (regardless of the existence of a Background with steps)?

brasmusson avatar Aug 19 '17 08:08 brasmusson

Ah yes of course!

aslakhellesoy avatar Aug 19 '17 11:08 aslakhellesoy

The alternative would be for the compiler to just diligently compile the test case as usual, and leave it up to the higher-level bits (e.g. the runner) to figure out that this step is from a background and still give the scenario an undefined status.

Seems a little bit icky to me to have the compiler deciding about things like that.

I'm happy to let this edge-case come out in the wash, TBH.

mattwynne avatar Aug 24 '17 09:08 mattwynne

I agree with @brasmusson - if there are no steps in the scenario, no steps in the pickle regardless of how many steps there are in the background.

I don't see why you want to leave that decision to individual cucumber implementations when it can be done consistently in the Gherkin lib @mattwynne.

Also seems like it would be hard for cucumber to determine whether pickle steps came from background or not.

aslakhellesoy avatar Aug 24 '17 17:08 aslakhellesoy

Currently the compilers treat the edge-case for a scenario with no steps (by not creating a Pickle) the same way regardless of any background content. So changing the outcome of that edge-case to creating an empty Pickle (regardless of any background content), should fit in nicely in the current compiler structure.

brasmusson avatar Aug 24 '17 18:08 brasmusson

I might not get to that until next week, so have a go at it if you want @brasmusson!

aslakhellesoy avatar Aug 24 '17 21:08 aslakhellesoy

I'm late to the party on this one but, for what it's worth, I see outlines test generators and backgrounds as test adjusters. So not pickling an outline without example rows makes sense because there are no concrete tests to wrap up. Similarly, a feature with only a background shouldn't pickle anything because it would be completely arbitrary to say what number of pickles should be made for anything besides 0 because there are still no actual tests to attach to. Once a scenario is added (whether it be direct with Scenario: or indirect with an outline plus example row), however, it can be said that there is a test and thus something to pickle up.

If I understand the point of pickles, they are the 'resolved' form of a test. We don't produce one pickle for an outline, we 'resolve' it and produce one pickle for each example row. We don't pickle up a scenario and then have an extra object for the background steps, we 'resolve' it and produce one pickle with all steps included.

This being the case, I would expect one pickle per Scenario: or non-parameter row in an Examples:. and for those pickles to include all applicable steps, regardless of where they came from.

enkessler avatar Oct 21 '17 02:10 enkessler

That!s a nice explanation @enkessler and I believe this is how it currently behaves.

aslakhellesoy avatar Oct 21 '17 06:10 aslakhellesoy

@aslakhellesoy According to our test data, no.

Previous behavior: no pickles created for a scenario without steps. Current behavior: pickle created for a scenario without steps but steps from a background are left out of the pickle Expected behavior: pickle created for a scenario without steps and steps from a background are included in the pickle.

enkessler avatar Oct 21 '17 13:10 enkessler

I disagree.

Imagine you have this:

Feature: Go shopping

  Background: we have some money
    Given I have £10

  Scenario: spend some of it
    When I buy a paper for £1
    Then I should have £9 left

  Scenario: spend some of it
    When I buy a jumper for £11
    Then I should be denied
    But I should have £10 left

Now we want to add some more scenarios, without fleshing them out, because we need to do some more analysis. We're adding them more as a note to remind ourselves that we have some more work to do.

Feature: Go shopping

  Background: we have some money
    Given I have £10

  Scenario: spend some of it
    When I buy a paper for £1
    Then I should have £9 left

  Scenario: can't afford what we want
    When I buy a jumper for £11
    Then I should be denied
    But I should have £10 left

  # New scenarios

  Scenario: get a discount

  Scenario: got a loyalty card with a free gift

We don't want those new scenarios to run at all, not even with the background. It potentially slows down the build, and it's pointless - there are no steps in the scenarios themselves. There is nothing new to learn from executing these scenarios.

We do however want the report to contain information that we had 4 scenarios, 2 of which are undefined. We can represent that with empty pickles.

If we did what you're suggesting, Cucumber would have to execute the 2 empty scenarios. It can't tell that its sole step came from a background and none from the scenario and decide not to execute it.

aslakhellesoy avatar Oct 21 '17 21:10 aslakhellesoy

Cucumber executing a test that got written down is a feature, not a bug. If you don't want a test to to run because it isn't finished yet, that is what filters are for. Slap a @wip tag on it and call it a day.

# New scenarios

@wip
Scenario: get a discount

@wip
Scenario: got a loyalty card with a free gift

If nothing else, having different behavior for this edge case is inconsistent. As soon as you add a single step in the Scenario: it gets treated normally and is going to start getting executed whether it is 'finished' or not, so the burden of having their test suite be able to accommodate for unfinished tests remains on the test writers. This is the right place for that burden because when a test is complete can only be known by the author. Maybe it is done after one step, maybe it will be done in another ten steps. Maybe it's another case of really bad use of the tool and they really do mean the test to have no additional steps (we've all seen sillier things over on the forums).

Admittedly, the lack of any steps in a test aside from background ones is highly suspicious but if you want a tool warn you about suspicious things, that's what linters are for*. A compiler's job is to take in your source code (Gherkin) and spit out something actionable (Pickles), silly though that action may be.

TLDR: preventing those test from running is someone else's job

*We should really get back around to writing that thing.

enkessler avatar Oct 22 '17 02:10 enkessler

@aslakhellesoy Fun fact: I ran into this use case just the other day.

I had a couple complete scenarios that had common enough setups that a Background had been made. I also had another half dozen Scenarios that I had yet to finish (marked with a @wip, naturally). Being able to resume working on the unfinished tests by just removing the @wip tag and running them in order to see what to do next was nice. A nicer alternative than having to add a fake step to the test in order to get it to run only to then have to replace with a real step a second later.

If I upgrade to Gherkin 5 I will no longer have this nicety. :(

enkessler avatar Dec 21 '17 16:12 enkessler

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

lock[bot] avatar Dec 21 '18 17:12 lock[bot]

So I ran into this with Cucumber JVM.

The general rule is that the state of a scenario is the worst state of its executed steps and passing (the best state) if no steps have been executed yet. By making this undefined two rather annoying edge cases are introduced:

  1. The first time a before hook is invoked the state of the scenario was undefined.
  2. When a scenario is undefined, there should be undefined steps and snippets to print because each undefined step generates a snippet. Unless it is the empty scenario.

Because we're deprecating non-strict mode undefined steps will always result in a failure. So I would suggest making empty scenarios fail. Perhaps with a pending exception/state.

mpkorstanje avatar Feb 16 '20 18:02 mpkorstanje

At the moment, it looks like the gherkin parser is parsing empty scenarios as requested in the initial message of that issue.

cucumber-ruby is reporting the scenario as undefined. cucumber-jvm is reporting the scenario as passed. cucumber-js is reporting something like "2 scenarios (1 passed)" when no hooks are set. It is reporting it passed when hooks are executed.

So, what needs to be done now? Should we try to aligning cucumber-jvm and cucumber-js on cucumber-ruby? Should that be part of that same issue? Maybe we could close it and open new one in cucumber-jvm and cucumber-js?

aurelien-reeves avatar Apr 26 '21 10:04 aurelien-reeves

As discussed with Aslak, we'll open a PR with an empty scenario in the compatibility kit The expected result should be the scenario to be undefined, and no hooks to be executed - like cucumber-ruby

aurelien-reeves avatar Apr 26 '21 14:04 aurelien-reeves

👍 let's make sure to link from this issue to the PR once it's open so people can follow the trail.

mattwynne avatar Apr 26 '21 15:04 mattwynne

Personally I would keep this issue open until the PR is merged but I don't feel strongly about that.

mattwynne avatar Apr 26 '21 15:04 mattwynne

https://github.com/cucumber/cucumber/pull/1498

aurelien-reeves avatar Apr 27 '21 08:04 aurelien-reeves

Reopening this because of discussion in Slack

@davidjgoss:

What's the consensus across implementations about scenarios that contain no steps? This seems to break cucumber-js, not sure if it's worth addressing https://github.com/cucumber/cucumber-js/issues/1668

@mpkorstanje:

Well, Aslak seems to think it's a good idea to have the empty scenario result in an undefined result. I think that empty scenarios should be treated like an ordinary scenario. Execute all the before/after hooks, but not any steps (because there are none). It makes it much easier to keep everything consistent (esp when dealing with events, scenario outcomes, ect). If empty scenarios are undesirable that should be handled by a linter.

@aslakhellesoy:

I don't have a strong opinion about it, just that we implement it consistently.

aslakhellesoy avatar May 13 '21 12:05 aslakhellesoy

I think that empty scenarios should be treated like an ordinary scenario. Execute all the before/after hooks, but not any steps (because there are none). It makes it much easier to keep everything consistent (esp when dealing with events, scenario outcomes, ect). If empty scenarios are undesirable that should be handled by a linter.

I like all of those words. My only question is whether or not steps from a Background count or not.

enkessler avatar May 14 '21 04:05 enkessler

We may let users choose how they would process their empty scenarios with an option:

  • execute all hooks and background steps, and report the scenario with a status depending the execution of those background steps and hooks
  • do not execute anything and report it as undefined skipped|empty|pending|...

I think both options may find their audience.

Also, I was curious to know how other tools consider empty tests. I have checked the behavior of rspec. It shows a "Not yet implemented" and report the test as pending. Hooks are not executed: if an error is raised, implemented tests are reported as "Error", but the not implemented test is still reported as pending.

$ bundle exec rspec spec/cucumber/cli/main_spec.rb:23 -fd
Run options: include {:locations=>{"./spec/cucumber/cli/main_spec.rb"=>[23]}}

Cucumber::Cli::Main
  #execute!
    passed an existing runtime
      configures that runtime (FAILED - 1)
      uses that runtime for running and reporting results (FAILED - 2)
      do something (PENDING: Not yet implemented)

Pending: (Failures listed here are expected and do not affect your suite's status)

  1) Cucumber::Cli::Main#execute! passed an existing runtime do something
     # Not yet implemented
     # ./spec/cucumber/cli/main_spec.rb:50


Failures:

  1) Cucumber::Cli::Main#execute! passed an existing runtime configures that runtime
     Failure/Error: raise "error"
     
     RuntimeError:
       error
     # ./spec/cucumber/cli/main_spec.rb:10:in `block (2 levels) in <module:Cli>'

  2) Cucumber::Cli::Main#execute! passed an existing runtime uses that runtime for running and reporting results
     Failure/Error: raise "error"
     
     RuntimeError:
       error
     # ./spec/cucumber/cli/main_spec.rb:10:in `block (2 levels) in <module:Cli>'

Finished in 0.00136 seconds (files took 0.75502 seconds to load)
3 examples, 2 failures, 1 pending

Failed examples:

rspec ./spec/cucumber/cli/main_spec.rb:30 # Cucumber::Cli::Main#execute! passed an existing runtime configures that runtime
rspec ./spec/cucumber/cli/main_spec.rb:40 # Cucumber::Cli::Main#execute! passed an existing runtime uses that runtime for running and reporting results

Do you know other tools which may be compared to cucumber as a BDD test runner? I think we may comply to what is done with our "competitors" if some kind of consensus has already been found elsewhere.

aurelien-reeves avatar May 17 '21 07:05 aurelien-reeves

While we could let users choose the behaviour, I don't think we should. It introduces complexity without significant benefit.

I think we should choose one behaviour and implement it consistently.

aslakhellesoy avatar May 17 '21 09:05 aslakhellesoy

For what it's worth, considering the empty scenario as a scenario with 0 steps that will execute before/after all and before/after scenario hooks and is passed by default is the most consistent and least complicated solution.

  1. Hooks have access to the current scenario state. This can be elegantly defined as the worst result of all previous steps/hooks or passed if none.
  2. So for consistency, after completion the scenario result can be defined as the worst result of all steps/hooks or passed if there was no result.
  3. Undefined steps result in a suggested step definition snippet event. So for every undefined scenario there is at least one suggested snippet. By defining the scenario result as passed when there are no step (rather then undefined) we ensures that for an undefined step/scenario there is always at least one suggested snippet. This removes an edge case for implementers of plugins.

mpkorstanje avatar May 17 '21 10:05 mpkorstanje

The fact that a scenario is empty is an important information. It may even be an very important information in a BDD workflow, or within a living documentation. Reporting it as passed or failed is loosing that piece of information.

From a testing tool point of view, I know some tools which reports empty tests or tests without expectation, and requires the developer to explicitly specify that no assertion is actually expected. As annoying as it can be for the developer, it make it easier to retrieve that info. And for code maintenance, it also make it explicit that the test is empty or has no expectation on purpose.

aurelien-reeves avatar May 17 '21 11:05 aurelien-reeves

I'm thinking about the principle of least surprise for this one. As a user, given my intent is something like "putting this scenario title here as a placeholder, coming back later to flesh it out" (see early comments), I think I'd be more surprised if Cucumber ran the hooks and background steps.

davidjgoss avatar May 17 '21 12:05 davidjgoss

Following the principle of least surprise everything that happens before a scenario should happen before a scenario with no steps. The latter being a subset of the former.

As an analogue: Scenarios are the the equivalent of tests. Features the equivalent of a class with multiple tests. A test with no statements in the method body is equivalent to an empty scenario.

Feature: Test
  Scenario: nothing

Would be equivalent to:

class Test {

  @BeforeEach
  public void before(){
   
  }

  @AfterEach
  public void before(){
     
  }

  @Test
  public void nothing(){
  
  }
}

This is a valid junit test and will execute all hooks. If you have a decent linter, it will complain about a test without an assertion.

a BDD workflow,

We can't and shouldn't try to facilitate all possible work flows during test execution. That's what linters are for. It also solves the problem of making things optional or not. Linters can usually be configured with rules about things they complain and don't complain about.

mpkorstanje avatar May 17 '21 12:05 mpkorstanje

I don't really see a Scenario as being analogous to a JUnit test. My angle was that it only becomes a runnable test once I actually add some steps and implement the glue for them - before that its just a piece of unfinished documentation. This is maybe not the most prevalent point of view though...

davidjgoss avatar May 17 '21 12:05 davidjgoss