positron icon indicating copy to clipboard operation
positron copied to clipboard

%pip install pandas requires a kernel restart before being use - refactor to get_pandas that do the import each time, wrapped in a try-except.

Open ronblum opened this issue 1 year ago • 10 comments

Positron Version:

Positron Version: 2024.06.1 build 27 Code - OSS Version: 1.90.0 Commit: a893e5b282612ccb2200102957ac38d3c14e5196 Date: 2024-06-26T01:22:29.024Z Electron: 29.4.0 Chromium: 122.0.6261.156 Node.js: 20.9.0 V8: 12.2.281.27-electron.0 OS: Linux x64 6.5.0-41-generic

Steps to reproduce the issue:

EDIT: 2024-07-18 by @jthomasmock

Daniel: I think I know what the issue is.

Pandas is discovered when ipykernel is first loaded, if we install pandas within the current session (eg with %pip install pandas) then even though you are able to use it, the internal ipykernel calls to pandas won't be able to use them as pd_ wasn't found when launching the kernel. > A simple solution is to simply restart the kernel.

Here's what happens when ipykernel is launched the following is evaluated, and pd_ becomes None as pandas is not installed. If you install pandas now with %pip install pandas, you can run import pandas and it will work, but pd_ won't be updated until you restart the kernel.

Wasim: I think we should move away from importing third party packages on startup the way we currently do, and instead have functions like get_pandas that do the import each time, wrapped in a try-except.


  1. In the Python Console (cribbed from test plan)
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0, 10, size=(1000, 100)))
column_names = [f"Column_{i+1}" for i in range(100)]
df.columns = column_names

The df object should appear under the "Data" section

  1. Click on the checkboard to the right of df to view the data.
    • Sometimes, the data explorer tab doesn't open.
    • No errors are emitted, though, in the output.
    • This doesn't happen consistenty.

What did you expect to happen?

A data explorer tab should open.

Were there any error messages in the output or Developer Tools console?

Not in the output.

ronblum avatar Jun 26 '24 20:06 ronblum

I think I know what the issue is.

Pandas is discovered when ipykernel is first loaded, if we install pandas within the current session (eg with %pip install pandas) then even though you are able to use it, the internal ipykernel calls to pandas won't be able to use them as pd_ wasn't found when launching the kernel. A simple solution is to simply restart the kernel.

Here's what happens when ipykernel is launched the following is evaluated, and pd_ becomes None as pandas is not installed. If you install pandas now with %pip install pandas, you can run import pandas and it will work, but pd_ won't be updated until you restart the kernel.

https://github.com/posit-dev/positron/blob/ff87538651c017bcd442a2e8eb6c4d41a56c8454/extensions/positron-python/python_files/positron/positron_ipykernel/third_party.py#L64

https://github.com/posit-dev/positron/blob/ff87538651c017bcd442a2e8eb6c4d41a56c8454/extensions/positron-python/python_files/positron/positron_ipykernel/third_party.py#L20-L25

This behavior shouldn't be linux specific

dfalbel avatar Jun 26 '24 22:06 dfalbel

I think we should move away from importing third party packages on startup the way we currently do, and instead have functions like get_pandas that do the import each time, wrapped in a try-except.

It’ll mean that users don't have to restart after installing (most of the time; some packages still require a restart), and it’ll make kernel startup a little bit faster since we’ll only import third party packages when they’re needed.

Alternatively, @dfalbel mentioned on Slack:

I think this makes sense! I think it's also fine having to restart the kernel, but we should at least re-surface the error to the UI, with some tips that restarting might fix.

seeM avatar Jun 27 '24 13:06 seeM

Do we feel 100% confident that we have identified the root cause here? I ask because in smoke tests, I am seeing occasional failures where the test double clicks a variables row and no data explorer opens. These tests pre-install all their dependencies.

testlabauto avatar Jun 27 '24 15:06 testlabauto

I don't see any other reasons for that to happen. Can you try in the smoket ests if running %view df returns an error?

dfalbel avatar Jun 27 '24 17:06 dfalbel

Sure. I will try that. I also just put in retries in case double clicking on the variables row is the real problem.

testlabauto avatar Jun 27 '24 18:06 testlabauto

It looks to me like click retries have solved the smoke test problem! Carry on :)

testlabauto avatar Jun 27 '24 20:06 testlabauto

Should we open an issue for Wasim's suggestion to try-expect and/or surface the error per Daniel?

jthomasmock avatar Jul 01 '24 16:07 jthomasmock

@jthomasmock I made #3800 but then closed it. I think we can use this issue to track the %pip install pandas bug.

seeM avatar Jul 02 '24 10:07 seeM

I've relabled this issue and added context to original comment.

jthomasmock avatar Jul 19 '24 00:07 jthomasmock

I ended up approaching this by forwarding the error to the user instead of lazily loading pandas as it's a simpler change and also solves other possible issues during initialization that wouldn't be covered by simply loading pandas lazily. See https://github.com/posit-dev/positron/pull/4122 for more info.

dfalbel avatar Jul 29 '24 22:07 dfalbel

Verified Fixed

Positron Version(s) : 2024.07.0-113
OS Version          : OSX

Test scenario(s)

Error message appears for user when trying open data frame post pandas install without restart:

Image

Link(s) to TestRail test cases run or created: N/A

testlabauto avatar Jul 30 '24 16:07 testlabauto