nbgrader icon indicating copy to clipboard operation
nbgrader copied to clipboard

"validate" cannot detect error in R code

Open jnishii opened this issue 3 years ago • 12 comments

I'm trying to use nbgrader for R language, but the "Validate" button and autograder doesn't detect errors in answer cells. I tried the following commands to throw error, but "validate" button always returns the message "Success! Your notebook passes all the tests" and I always got full score.

raise NotImplementedError()
fail()
e <- simpleError("test error")
stop(e)
throw('NotImplementedError') 

When I run the cells above, they return Error, so I think nbgrader seems not to detect the thrown error. I'm currently using nbgrader for python on the same system without problems. Is there some solution?

Operating system

Ubuntu 16.04.6 LTS

nbgrader --version

nbgrader version 0.6.1

jupyterhub --version (if used with JupyterHub)

0.9.6

jupyter notebook --version

6.0.2

Expected behavior

Detect errors in notebook

Actual behavior

"Validate" button doesn't detect erros.

jnishii avatar Mar 10 '21 04:03 jnishii

Two questions here:

  1. are you expecting the error to stop execution, or
  2. are you getting pass when you expect fail ?

For the latter - I had a similar problem when trying to do stuff with a Stata kernel (see https://github.com/kylebarron/stata_kernel/issues/373 for my embarrassment)

The bottom line is you need a fairly recent version on nbgrader to get the updated logic for catching the output.output_type == "stream" and output.name == "stderr" errors

perllaghu avatar Mar 10 '21 08:03 perllaghu

Thank you for your comment.

Execution stops by the error, so, it’s not my problem. I’m getting pass by “Validate” button when I expect fail.

Here is an example of cell output.

  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "ename": "ERROR",
     "evalue": "Error: error\n",
     "output_type": "error",
     "traceback": [
      "Error: error\nTraceback:\n",
      "1. fail(\"error\")",
      "2. expect(FALSE, message, info = info)",
      "3. exp_signal(exp)",
      "4. withRestarts(if (expectation_broken(exp)) {\n .     stop(exp)\n . } else {\n .     signalCondition(exp)\n . }, continue_test = function(e) NULL)",
      "5. withOneRestart(expr, restarts[[1L]])",
      "6. doWithOneRestart(return(expr), restart)"
     ]
    }
   ],
   "source": [
    "fail(\"error\")"
   ]
  },

Obviously, it detects the error in code cell. But when I click the “validate” button on the menu bar, I get "Success! Your notebook passes all the tests.”

So, it’s not like your case using stata kernel, but it’s the same at the point that I have everything working apart from the grading functionality.

regards,

On Mar 10, 2021, at 17:38, Ian Stuart [email protected] wrote:

Two questions here:

• are you expecting the error to stop execution, or • are you getting pass when you expect fail ? For the latter - I had a similar problem when trying to do stuff with a Stata kernel (see kylebarron/stata_kernel#373 for my embarrassment)

The bottom line is you need a fairly recent version on nbgrader to get the updated logic for catching the output.output_type == "stream" and output.name == "stderr" errors

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

jnishii avatar Mar 10 '21 10:03 jnishii

Hmmm ok - so the output.output_type is error, which should be picked up by ..... no, wait

We need to check what validate is supposed to do - I think it just runs the notebook, and shows any output errors.

It certainly does that in a python3 notebook.

I'm doing some testing now - but I'll try adding some errors to an R notebook once I'm done, and report back

perllaghu avatar Mar 10 '21 11:03 perllaghu

OK.... here's what I have.

  1. We need to set up the converter to produce the correct failure conditions in the release notebook In nbgrader_config.py I have:
c.ClearSolutions.code_stub = dict(
    python = '# YOUR CODE HERE\nraise NotImplementedError()',
    R = '# YOUR CODE HERE\nstop("NotImplementedError")'
    )
  1. In the source notebook, my autograded answer looks like this:
squares <- function (n) {
    ### BEGIN SOLUTION
    if (!(is.numeric(n) & n > 0)) {
        stop("ValueError")
        }
    data <- 1:n; n
    return( data*data )
    ### END SOLUTION
}
  1. This produces a release notebook with:
squares <- function (n) {
    # YOUR CODE HERE
    stop("NotImplementedError")
}
  1. Note that this, of itself scores no value.... you need autograded_tests for that:
library(testthat)
stopifnot(squares(1) == c(1))
stopifnot(squares(10) == c(1, 4, 9, 16, 25, 36, 49, 64, 81, 100))
expect_error(squares('0'), "ValueError")

.... and remember

validate runs the student's notebook as is - it doesn't have any of the hidden test or mark-schemes available, and has no idea about points for things - it just runs the whole notebook for an errors / no_errors thing

perllaghu avatar Mar 16 '21 10:03 perllaghu

Thank you for your report!

I edited nbgrader_config.py and used stop() function for c.ClearSolutions.code_stub, and confirmed that the stop() function is in the answer cell of a released notebook. But, my problem is still there, I mean

1. When I select “Kernel / Restart and Run all” in the menu on the released
notebook with keeping the stop() function, the execution stops at the answer cell
and shows

Error in eval(expr, envir, enclos): NotImplementedError
Traceback:
1. stop("NotImplementedError”)

This is a reasonable result but I still have the following problems:

2. If I click “Validate” it shows "Success! Your notebook passes all the tests.”.
3. Furthermore, if I submit this notebook, autograder gives it full mark in spite that
    an answer cell contains stop("NotImplementedError”).

I tried using fail(), throw() instead of stop(), but all of them gave me the same problem.

As wrote before, nbgrader works well with python notebooks on the same system. Does autograding works well for R on your system?

On Mar 16, 2021, at 19:54, Ian Stuart @.***> wrote:

OK.... here's what I have.

• We need to set up the converter to produce the correct failure conditions in the release notebook In nbgrader_config.py I have: c.ClearSolutions.code_stub = dict( python = '# YOUR CODE HERE\nraise NotImplementedError()', R = '# YOUR CODE HERE\nstop("NotImplementedError")' )

• In the source notebook, my autograded answer looks like this: squares <- function (n) { ### BEGIN SOLUTION if (!(is.numeric(n) & n > 0)) { stop("ValueError") } data <- 1:n; n return( data*data ) ### END SOLUTION }

• This produces a release notebook with: squares <- function (n) { # YOUR CODE HERE stop("NotImplementedError") }

• Note that this, of itself scores no value.... you need autograded_tests for that: library(testthat) stopifnot(squares(1) == c(1)) stopifnot(squares(10) == c(1, 4, 9, 16, 25, 36, 49, 64, 81, 100)) expect_error(squares('0'), "ValueError")

.... and remember

validate runs the student's notebook as is - it doesn't have any of the hidden test or mark-schemes available, and has no idea about points for things - it just runs the whole notebook for an errors / no_errors thing

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

jnishii avatar Mar 17 '21 06:03 jnishii

Yes..... I can do the full autograding, manual grading, and feedback cycle in an R notebook - I've attached the release version of my complete demo notebook. R squares demo assessment (release).zip

Question: what happens when you run through the full notebook yourself? This is what I get: notebook errors image

perllaghu avatar Mar 17 '21 07:03 perllaghu

I put your release file in a release directory of my nbgrader env, and found that validate button works good for your file but not for my release file.

I’ve not found the cause yet, but I’ll check the difference of these files in more detail and report the results.

On Mar 17, 2021, at 16:59, Ian Stuart @.***> wrote:

Yes..... I can do the full autograding, manual grading, and feedback cycle in an R notebook - I've attached the release version of my complete demo notebook. R squares demo assessment (release).zip

Question: what happens when you run through the full notebook yourself? This is what I get: —

jnishii avatar Mar 19 '21 10:03 jnishii

Remember that "validate" just runs the notebook.

It knows nothing of hidden tests or mark schemes.

You should be able to run a released notebook in a pure 'jupyter/r-notebook' docker image.

If you can run my notebook and get errors, but not yours..... Start by making a brand new notebook, with the 'stop' statements in it, and run it.

Bypass the overheads of nbgrader... Confirm that you are getting stop functionality in a plain notebook.

(do you want to post a sample [failing] notebook here, to see if I spot anything in it?)

perllaghu avatar Mar 19 '21 16:03 perllaghu

I checked the release files again, and found that

1) “Validate button” doesn’t detect the runtime error by stop("NotImplementedError”) 
if this function appears outside of a function in an answer cell, but the “Validate button" 
detects stop() when it’s called in a defined function as your release file.

2) “Validate button” doesn’t detect errors given by test_that() function in spite
that the page "https://github.com/ttimbers/nbgrader_r_demo” reported that
test_that() would work for nbgrader.

I put a release file here. Even if I erase "stop("NotImplementedError”)” in the second answer cell and write “ans=4” (wrong answer), “Validate” doesn’t detect the error in the evaluation cell and outputs "Success! Your notebook passes all the tests."

When I run through the file munually (by hitting shift-enter), I got error messages as this output but “Validate button” doesn’t detect these errors.

jnishii avatar Mar 22 '21 03:03 jnishii

Could this be the same issue that was fixed in https://github.com/jupyter/nbgrader/pull/1381 ? Could you try from head and see if it still persists?

jhamrick avatar Mar 25 '21 23:03 jhamrick

Thank you for your comment.

I found three reasons why “Validate button” doesn’t work as my expectations.

  1. “Validate button” doesn’t detect error by

    stop(“NotImplementedError”) or raise(...)

in answer cells, because answer cells are not targets of failure check in _get_failed_cells() in validator.py.

Now, I understand it’s a specification of nbgrader.

  1. “Validate button” doesn’t detect an error given by test_that() function, because the error gives "output_type": “stream”, so this is a similar case in #1381. I was pointed out about this problem by Ian before, but I checked the results of the above case (1), and didn’t notice this problem. Sorry Ian.

A different point from #1381 is that test_that() function doen’t produce output.name=stderr but output.name=stdout, so there no clue to find the error in the autograde cell.

To resolve this problem, I redefined the test_that() as below.

test_that <- function(...){
    stopifnot(testthat::test_that(...))
}

It seems working now!

  1. “Validate button” doesn’t detect the error in autograde cell when points of the cell is set to 0. I myself think that it would be better to detect errors even for this case. Is this an expected behavior of nbgrader?

jnishii avatar Mar 26 '21 05:03 jnishii

Yeah this latter point does indeed seem like a bug, I would also expect the validator to say the test has failed even if it is worth zero points.

jhamrick avatar Mar 27 '21 19:03 jhamrick