vscode-jupyter icon indicating copy to clipboard operation
vscode-jupyter copied to clipboard

Adopt new Python extensions non-blocking API for interpreter disocvery

Open karrtikr opened this issue 3 years ago • 28 comments

Three discovery APIs are being added for Jupyter which is meant to replace existing getInterpreters() API: https://github.com/microsoft/vscode-python/pull/17452

type PythonApiForJupyterExtension = {
     ...
    /**
     * IInterpreterService
     */
    readonly refreshPromise: Promise<void> | undefined;
    /**
     * IInterpreterService
     */
    readonly onDidChangeInterpreters: Event<PythonEnvironmentsChangedEvent>;
    /**
     * Equivalent to getInterpreters() in IInterpreterService
     */
    getKnownInterpreters(resource?: Uri): PythonEnvironment[];
    /**
     * Equivalent to getAllInterpreters() in IInterpreterService
     */
    getInterpreters(resource?: Uri): Promise<PythonEnvironment[]>;
    ...

Previous behavior:

Existing getInterpreters() is an asynchronus blocking API, so it waits until all of discovery is completed the first time it is called. Subsequent calls return instantaneously from cache if the same resource is queried again.

New behavior:

  • New getKnownInterpreters() simply returns whatever is currently in cache. It is useful when we do not need all of the interpreters at once, which is usually the case. In the Python extension we are able to replace all instances of getInterpreters() with getKnownInterpreters(). Do note that the kind info returned by this API may change as refresh progresses.
  • onDidChangeInterpreters fires when an environment gets added/removed/updated in cache. Event properties tracks the old and new environments accordingly.
  • Python extension triggers a refresh in background when the extension loads for the first time, which is when refreshPromise is set. It can be used to implement now deprecated getInterpreters() in the following way:
    /** Gets all interpreters **/
    public async getInterpreters(resource?: Uri): Promise<PythonEnvironment[]> {
        await this.refreshPromise; // Wait for the refresh to finish so that the cache contains all environments.
        return getKnownInterpreters(resource); // Return from cache.
    }
  • getInterpreterDetails() is still available as before, and can be used to get complete and reliable information on environments. As it is a blocking call, it's recommend to only use it if complete information is needed, which is generally only the case for selected interpreters.

You can the see the new API in action here in our dynamic quickpick https://github.com/microsoft/vscode-python/pull/17043#issue-975156935.

Note: Any internal interpreter list caching that was recently added should go away once onDidChangeInterpreters events are available.

Changes regarding quickpick API

We also used to expose getSuggestions API for the quickpick, which has the same problem as getInterpreters(). The following non-blocking API is being added to replace it:

getKnownSuggestions(resource: Resource): Promise<IInterpreterQuickPickItem[]>;

It can be used in combination of refreshPromise to get all suggestions. Let me know if you have any questions! 😄

karrtikr avatar Sep 17 '21 23:09 karrtikr

@karrtikr just want to double check, this new API won't break existing users, right? If a user is on python insiders, but not updating the jupyter extension, they won't be broken because the old extension is still using the old API?

rchiodo avatar Sep 27 '21 20:09 rchiodo

Yes, the old APIs are still available, the new APIs are added in addition.

karrtikr avatar Sep 28 '21 18:09 karrtikr

FYI we plan to expose discovery API generally for all extensions and not just Jupyter, and it's a little different from this API. Proposed design: https://gist.github.com/karthiknadig/4b7bc9a9420e9071105007b5cc096f65

I'll leave it up-to you guys on whether you want to wait for the final general API. This issue can be closed in that case.

karrtikr avatar Sep 28 '21 19:09 karrtikr

@karrtikr Was looking at this API today, some questions:

  • old & new
    • When we get the first event old = undefined, and new = A
    • Next lets assume somethign changes, then old = A and new = B
    • How do I know whether the old value from the second event === new value from first event.
      • Are the object refs the same, (looking at the code, they are not)
      • Hence its impossible for me to update the objects at my end, i.e. i cannot update the interpreter information as i don't have the original object (all i'm getting from python extension is some new object that says its the original, but its a whole new refernce).
      • I.e.there's no identity

Here's the code that shows the objets are re-created everytime for every event

            this.changed.fire({
                type: event.type,
                new: event.new ? convertEnvInfo(event.new) : undefined,
                old: event.old ? convertEnvInfo(event.old) : undefined,
                resource: event.searchLocation,
            });

Internally things are ok for Python extension as the original object refs are stored and updated in place

            } else if (seen[event.index] !== undefined) {
                const oldEnv = seen[event.index];
                seen[event.index] = event.update;
                didUpdate.fire({ index: event.index, old: oldEnv, update: event.update });

Solution

  • We should pass the same objects around instead of creating them again & again
  • I think python extension can easily maintain a weakmap of these ojbects (key = python object & mapped object is the one sent to external users) or other ways...(e.g. regular map, but more expensive)

DonJayamanne avatar Oct 28 '21 21:10 DonJayamanne

How do I know whether the old value from the second event === new value from first event.

You could compare the .path property directly. I think the other properties in old won't be used anyways, so maybe we should only send the path as an id instead of the previous environment (denoted by old).

Internally things are ok for Python extension as the original object refs are stored and updated in place

We maybe doing this but we do not rely on this information and do path check directly, see here: https://github.com/microsoft/vscode-python/blob/ec628195910d7b67fba12ece521d60b75a81fb0f/src/client/pythonEnvironments/base/info/env.ts#L245-L247

karrtikr avatar Oct 29 '21 18:10 karrtikr

You could compare the .path

That wouldn't work as some conda environments. If you create conda environments on unix, the python path is the same as the base conda env.( Unless you provide the python argument in the clip).

DonJayamanne avatar Oct 29 '21 19:10 DonJayamanne

Interesting, we're not handling such environments at the moment in the discovery component and completely rely on 'path', but weirdly haven't received any issue reports.

cc @karthiknadig

karrtikr avatar Oct 29 '21 20:10 karrtikr

ndling such environments at the moment in the discovery component and completely rely

Ok, thanks. Weird. I thought we used to detect these. If i'm not mistaken we've been using some internally as well for testing purposes.

Lets leave that for now then, if its not supported then thats a whole separate issue. Hence please ignore the comments, i'll use the path as a key in personal & jupyter extension

DonJayamanne avatar Oct 29 '21 21:10 DonJayamanne

@karrtikr just use conda create env -n <name>

DonJayamanne avatar Oct 29 '21 22:10 DonJayamanne

Cool.

I thought we used to detect these.

We actually didn't, we detected the environment folders, but we only looked for python paths inside the env folder:

    /**
     * Return the interpreter's filename for the given environment.
     */
    public getInterpreterPath(condaEnvironmentPath: string): string {
        // where to find the Python binary within a conda env.
        const relativePath = this.platform.isWindows ? 'python.exe' : path.join('bin', 'python');
        return path.join(condaEnvironmentPath, relativePath);
    }

so such environments with base conda python path were never discovered.

karrtikr avatar Oct 30 '21 00:10 karrtikr

We actually didn't, we detected the environment folders, but we only looked for pytho

Thanks @karrtikr Interesting, @IanMatthewHuff how is it that yo'ure getting Jupyter environments listed in VS Code that don't have an executable? Referring to https://github.com/microsoft/vscode-jupyter/issues/5650

DonJayamanne avatar Nov 01 '21 16:11 DonJayamanne

@DonJayamanne Not fully sure what you are referring to. In #5650 I'm just referring to multiple KernelConnectionMetadatas that all share a path.

IanMatthewHuff avatar Nov 03 '21 16:11 IanMatthewHuff

@karrtikr I'd like to start this discussion again, given that the API has changed Here is what we would like to have

  • We would like to popular the list of interpreters in our kernel picker as and when they are discovered.
    • E.g. if python extension discovers Conda Env A, then we add Conda Env A into the list of kernels, and then add Conda Env B after a few ms or seconds and so on .
    • Basically we'd like to add the envionments as and when they are added
    • We'd like information such as
      • Type of environment
      • Name of environment
      • Python Version of Environment
      • Whether its 32 bit or 64 bit
    • Please advice if this is possible today with the API or not, and what API methods/events we should be using this
  • Blocking/non-blocking
    • In a separate thread you have indicated that calling some of the API could block the extension code, like language servers etc.
    • My suggest would be to block the API to not return anything until all of the necessary components have loaded in the Python extension.
    • This way the jupyter extension or others can use the API anytime they need.
    • E.g. assume we create a jupyter extension thats very very fast tomorrow, and the code that runs to load interpreters executes pretty much as soon as extension loads, by now Python extension might still be loading. Hence if you would like the Python extension to NOT be blocked by an interpreter API call, then I'd suggest you block the API until the necessary components have loaded in python extension. This allows consumers of the extension API to use the API freely, else its difficult to predict and control how the API is called.
    • Also please note, the Jupyter extension needs the list of kernels quickly and we've found that things are slow in the past, hence its likely we'll still continue fetching the interpreters as soon as the jupyter extension loads, so as to improve the over all UX in jupyter extension.
      • But this remains to be seen, based on how fast the new API is
  • From what I can tell the new API has an event onDidEnvironmentsChanged,
    • Lets assume a user has 10 enviroments and the event is triggered for one of the envirionments.
    • Next we call getEnvironmentDetails with the above information & we do this for all 10 events triggered.
    • From what I can tell this is the way the API is meant to be used, I'd like to understand if there are any problems here with this approach.

DonJayamanne avatar Aug 15 '22 20:08 DonJayamanne

if python extension discovers Conda Env A, then we add Conda Env A into the list of kernels, and then add Conda Env B after a few ms or seconds and so on .

Regarding first point, yes it's possible. But events are fired if any information is found even though it's not complete yet:

  • Event1: EnvA was found (not sure if it's conda yet)
  • Event2: Partial version was filled
  • Event3: Type detected as conda
  • Event4: Complete info is found

Are you interested in information in all these stages, or information only when it's complete? I think for the purposes of populating the list, it's fine to have partial information so users could at least begin selecting it. Both of these are possible to fetch, but you have to call the API differently.

karrtikr avatar Aug 15 '22 21:08 karrtikr

opulating the list, it's fine to have partial information so users could at least begin selecting it

Not sure i agree, displaying the wrong information to the user just because we have something would not be the best. I say wrong, because we could end up displaying an environment as a virtual environment, when in fact its a conda environment or the like, Hence in such cases, its possible the user is looking for a virtual env and we display a conda environment and when they attempt to run it, then things fall over completely (because running conda environments require activation, and now execution fails and they get a lot of errors)

I think for Python extension it doesn't really matter much what the user chooses, in Jupyter it makes a very big difference, basically its a make or break selection. If you get it wrong, things will not work at all, you'll get error in every step (missing dependencies, failure to start, etc), where as in python ext thats not the case, most of the things will work.

Are you interested in information in all these stages, or information only when it's complete?

Basically we need the information before we add it to the list. Lets take the following scenario into account

Event1: EnvA was found (not sure if it's conda yet)
Event2: Partial version was filled
Event3: Type detected as conda
Event4: Complete info is found

Lets assume we called the api method getEnvironmentDetails after Event2. Lets assume the envionment is detected as a virtual environment, but in fact its a Conda enviroment. Now after we get the resutls of the getEnvironmentDetails, would that point to the stable values? or is it possible that after we get the result for getEnvironmentDetails the events Event3, and Event4 can get fired again for the same environment.

Meaning the only time we can get the accurate information is to wait for everything to compete? Or can we get the accurate information by calling getEnvironmentDetails and if this is possible would this be faster than having to wait for everything to complete

DonJayamanne avatar Aug 16 '22 04:08 DonJayamanne

Two points:

  • Hence in such cases, its possible the user is looking for a virtual env and we display a conda environment and when they attempt to run it, then things fall over completely (because running conda environments require activation, and now execution fails and they get a lot of errors)

    Populating the list is different than selecting an interpreter. When user attempts to select an interpreter, that is when getEnvironmentDetails can be called with full details. So before running it we know it's conda, and it won't fall over. My point is for purposes of displaying in the list, partial information can be helpful and full env details can only fetched for the selected kernel.

  • I say wrong, because we could end up displaying an environment as a virtual environment, when in fact its a conda environment or the like,

    Secondly, the partial information displayed won't be wrong, the information only gets more detailed with time. For eg. a conda environment is not a virtual environment, so it'll never be classified as that. Pipenv environments are virtual environments under the hood, so it can be displayed as such until we figure out the better type for the env is pipenv.

Let me know your thoughts, before we move on to how to call the API.

karrtikr avatar Aug 16 '22 18:08 karrtikr

Populating the list is different than selecting an interpreter. When user attempts to select an interpreter, that is when getEnvironmentDetails can be called with full details. So before running it we know it's conda, and it won't fall over. My point is for purposes of displaying in the list, partial information can be helpful and full env details can only fetched for the selected kernel.

I think we're talking about two different things. Lets leave Python Interpreter selection out of this discussion. I'm referring to the kernel picker.

partial information can be helpful and full env details can only fetched for the selected kernel.

I'm referring to the Jupyter extension UI, not Python, I don't think you know how that works as this is not how the Jupyter UI works.

displayed won't be wrong, the information Pipenv environments are virtual environments under the hood, so it can be displayed as such until we figure out the better type for the env is pipenv.so it can be displayed as such until we

In a users views, this information is incorrect hence my point. Yes we can argue about technical details, but as an end user, I'm looking for a Pipenv, not a Virtual Env, hence (Jupyter Extension) cannot necessarily classify these as virtual envs. I.e. we cannot argue such technical details to an end user via a simple UI such as a quick pick.

I think have enough information at this stage, will check the API and get back to you.

DonJayamanne avatar Aug 16 '22 19:08 DonJayamanne

I think we're talking about two different things. Lets leave Python Interpreter selection out of this discussion. I'm referring to the kernel picker.

The following running scenario you mentioned comes after selection (picking the kernel):

"and we display a conda environment and when they attempt to run it, then things fall over completely (because running conda environments require activation, and now execution fails and they get a lot of errors)"

I was referring to that saying that won't be a problem.

I think have enough information at this stage, will check the API and get back to you.

Sure, check https://github.com/microsoft/vscode/issues/158246 for latest planned updates.

karrtikr avatar Aug 16 '22 19:08 karrtikr

I was referring to that saying that won't be a problem.

Assume user have a pip env and virtual env, now they select the virtual env assuming its the virtual env they are expecting. However the virtual env is infact a pipenv (as you mentioend pipenvs can be virtual envs). Now user selects this and runs the cell. The packages they installed like ipykernel, numpy, sklearn do not exist in there, and as a result things fall over and they get a lot of errors.

Based on what you said earlier this is possible.

DonJayamanne avatar Aug 16 '22 19:08 DonJayamanne

Any packages we install into it are considering it's Pipenv, and any packages users installed into it outside of VSCode are considering it's Pipenv, so I don't see why packages won't be found. But okay, maybe I still misunderstand so please go ahead.

karrtikr avatar Aug 16 '22 20:08 karrtikr

so I don't see why packages won't be found. B

Because as i mentioned user has two environmetns, pipenv and virtual env. hence packages in virtualenv will not be in pipenv.

DonJayamanne avatar Aug 16 '22 21:08 DonJayamanne

@karrtikr Scenario

  • User has two python environments A and B

Questions

  • How many times can the onDidEnvironmentsChanged get triggered?
    • Is it possible for this to get triggered twice, once for A and once for B and then again for A and B with type = "update".
    • It is my understanding that this is possible, if yes that's fine, as the even arg type clearly indicates the type of change.
  • Assume the discovery runs for 5 minutes (hypothetical, so we can expand the timeline and discuss what happens in the interim)
    • Assume we get the first event type="add" for envirnment A
    • Now, the Jupyter extension calls the API method
      • getEnvironmentDetails("A", useCache: false
        • Would we get the information after the full discovery?
        • I.e. is it possibel for us to get the details saying env type = FOO, and after the full discovery runs, then env type for A changes from FOO to BAR
        • In other words after this method returns, is it possible for the event onDidEnvironmentsChanged to get fired once again for the same environment A (because the env discovery is still in progress)
        • The same question applies to the API methods

DonJayamanne avatar Aug 24 '22 21:08 DonJayamanne

It is my understanding that this is possible, if yes that's fine, as the even arg type clearly indicates the type of change.

Yes that's possible. The add event should only be triggered once, update type can be triggered multiple times.

Would we get the information after the full discovery?

Are you asking if the information which is returned will be the same as the stable information after full discovery? Yes. However it won't wait for discovery to finish, so env discovery can be still in progress after this method returns.

In other words after this method returns, is it possible for the event onDidEnvironmentsChanged to get fired once again for the same environment A (because the env discovery is still in progress)

Yes, update type events can still get fired while discovery is going on. Consider resolving as separate from discovery.

karrtikr avatar Aug 25 '22 02:08 karrtikr

thanks, curious then what is wrong with the jupyter extension calling the getEnvironmentDetails method for every add event triggered.

DonJayamanne avatar Aug 25 '22 02:08 DonJayamanne

If kernel picker cannot work with partial details, there's nothing wrong it. getEnvironmentDetails returns the full details which is exactly what is needed. However it blocks running the interpreter to get details, so it's slower.

If it was possible for picker to display partial version information etc. and later update it, API offers the flexibility to display the info earlier so it can be selected earlier. We can potentially make it so that kind never changes.

I'm not sure on what is the case so I cannot advise.

karrtikr avatar Aug 25 '22 06:08 karrtikr

We can potentially make it so that kind never changes.

not sure what your mean by this are you saying that if i call getEnvironmentDetails once then env type could be Foo and then call it again then it could be Bar. from what i can tell them only way to get the env type is via the above api method.

DonJayamanne avatar Aug 25 '22 09:08 DonJayamanne

are you saying that if i call getEnvironmentDetails once then env type could be Foo and then call it again then it could be Bar.

If called with useCache: false, it returns the final env type and the type never changes.

If called with useCache: true, it returns partial info, it may return a env type that can change later. The Venv to Pipenv case we talked about earlier. If Jupyter is interested in this option, I can potentially look into improving this scenario, so that env type never changes (we always return the final type). Let me know.

karrtikr avatar Aug 25 '22 16:08 karrtikr

Summary of discussions

Discussion:

  • At a minimum Trigger event with stable info for
    • name
    • type
    • id
  • Id will be added and that is unique

Proposals:

  • onDidChangeEnvironment.xxxx: Event; type: add, remove, update
  • No more clear-all event
  • Two versions of getEnvironmentDetails, sync and async
    • getEnvironmentDetails(id:string): Promise<....>
    • getEnvironmentDetailsSync(id:string): {id:string; name:string; type: EnvType[];} & Record<string,string>
  • Might use progress indicator
  • Change ProgressNotificationEvent
    • Remove allPathsDiscovered
  • Change getRefreshPromise to getRefreshPromise(): Promise<void> | undefined; Remove args
  • Change refreshEnvironments to remove the argument. refreshEnvironments()
  • Take fix from Jupyter to monitor environments.txt to detect new conda environments
  • Detect new conda environments when vs code loads
    • Detect PyEnv as well the same way
    • Python extension will detect new environments without Jupyter having to call full refresh
  • New faster API getActivatedEnvironmentVariablesxxx(resource: Resource, id)

DonJayamanne avatar Sep 06 '22 19:09 DonJayamanne

Closing this as completed, the new API has been completely adopted.

DonJayamanne avatar Oct 23 '22 23:10 DonJayamanne

Can we consider https://github.com/microsoft/vscode-python/issues/19989 as unblocked from Jupyter's end, if there's no feedback/concerns?

karrtikr avatar Oct 24 '22 18:10 karrtikr