papermill icon indicating copy to clipboard operation
papermill copied to clipboard

Document common issues users encounter when using papermill

Open MSeal opened this issue 6 years ago • 8 comments
trafficstars

We should add a docs section which outlines common failures, why they happen, and how / where to look to address them. An example is linked below, but a few ideas come to mind:

  • Kernel not found
  • Kernel died (OOM + other reasons)
  • New language not translating parameters
  • Can't access input / output notebook paths
  • Scheme not supported etc...

slack thread

Dennis [Yesterday at 6:11 PM] @here hello everyone, anyone ever experience the following issue when executing a spark notebook from within a Docker container via Papermill: RuntimeError: Kernel died before replying to kernel_info. I found a closed nteract/papermill github issue regarding this problem, but it did not specify a valid solution

3 replies

Dennis [16 hours ago] Sorry nevermind, issue resolved; I realize this has nothing to do with Papermill

betatim [7 hours ago] @Dennis do you think it is worth it to post on the issue you found to say “the problem has nothing to do with papermill, you should investigate X instead” for people from the future who experience this problem?

mseal [< 1 minute ago] I do think we should gather some of these common issues and put them in our docs for papermill, if nothing else because papermill is a touch point for many users and they often don't know how the underlying technologies are interacting.

MSeal avatar May 03 '19 14:05 MSeal

EDIT: Ah, looks like I can't create an account as I don't have one of the following email addresses. If you have an @plot.ly, @formidable.com, @sagemath.com, or @google.com email address, you can create an account.

Hey @MSeal! I'm facing the same issue right now, haven't found a fix yet. ~I'll check out the Slack thread~.

I can add this to the documentation when I find a fix. Should I add it to the Troubleshooting section?

vinayak-mehta avatar Sep 03 '19 11:09 vinayak-mehta

Referring to the RuntimeError: Kernel died before replying to kernel_info.

vinayak-mehta avatar Sep 03 '19 11:09 vinayak-mehta

Yes that would be perfect! Thanks for offering to help :)

MSeal avatar Sep 04 '19 23:09 MSeal

@vinayak-mehta @MSeal Any updates about: RuntimeError: Kernel died before replying to kernel_info.

We are getting this every once in a while and sure why

yogevyuval avatar Jun 24 '20 12:06 yogevyuval

I'm a bit swamped so haven't been able to add QOL issue PRs here, like this one. Happy if someone else wanted to start adding this in a PR :heart:.

@yogevyuval Are you running anything in parallel in this context? There is a race condition that can occur rarely atm: see https://github.com/jupyter/jupyter_client/issues/487 for details on that. You can also get the kernel death if your machine is out of memory when it tries to launch the kernel. Furthermore if it's not the ipykernel the kernel itself may be dieing for some unrelated reasons where you may need to look for kernel logs to determine the cause.

MSeal avatar Jun 24 '20 21:06 MSeal

@MSeal Yes we are running different notebooks on the same time with the same kernel. Is that the case you are refereeing too? any hack to solve this for now? should we catch the exception and retry?

I don't think we are running out of memory..

How can I get more logs like you mentioned? that's the only thing I se

yogevyuval avatar Jun 24 '20 21:06 yogevyuval

Yeah current workaround is to catch the exception and if it's a Kernel died message to retry. This will only happen on initialization as it's bringing up the kernel and it loses the race to acquire the assigned port. We're discussing how to fix the race condition in that linked thread.

MSeal avatar Jun 24 '20 23:06 MSeal

Maybe add a hint on error due to set default values? If a notebook has a default and one sets the parameter using the CLI, the error might not be catched as the not "documented" parameter will be justed ignored in the execution and the default used.

dimensions:int = 10 # e.g. feature dimensions

papermill notebook.ipynb -p dimension 20 will add a new cell with

dimension = 10  # won't be used due to missing trailling s

Is a "strict" mode planned for notebook execution (I couldn't find it browsing the docs and issues)

enryH avatar Feb 02 '22 15:02 enryH