vscode-jupyter
vscode-jupyter copied to clipboard
Adopt new Python extensions non-blocking API for interpreter disocvery
Three discovery APIs are being added for Jupyter which is meant to replace existing getInterpreters()
API: https://github.com/microsoft/vscode-python/pull/17452
type PythonApiForJupyterExtension = {
...
/**
* IInterpreterService
*/
readonly refreshPromise: Promise<void> | undefined;
/**
* IInterpreterService
*/
readonly onDidChangeInterpreters: Event<PythonEnvironmentsChangedEvent>;
/**
* Equivalent to getInterpreters() in IInterpreterService
*/
getKnownInterpreters(resource?: Uri): PythonEnvironment[];
/**
* Equivalent to getAllInterpreters() in IInterpreterService
*/
getInterpreters(resource?: Uri): Promise<PythonEnvironment[]>;
...
Previous behavior:
Existing getInterpreters()
is an asynchronus blocking API, so it waits until all of discovery is completed the first time it is called. Subsequent calls return instantaneously from cache if the same resource is queried again.
New behavior:
- New
getKnownInterpreters()
simply returns whatever is currently in cache. It is useful when we do not need all of the interpreters at once, which is usually the case. In the Python extension we are able to replace all instances ofgetInterpreters()
withgetKnownInterpreters()
. Do note that the kind info returned by this API may change as refresh progresses. -
onDidChangeInterpreters
fires when an environment gets added/removed/updated in cache. Event properties tracks the old and new environments accordingly. - Python extension triggers a refresh in background when the extension loads for the first time, which is when
refreshPromise
is set. It can be used to implement now deprecatedgetInterpreters()
in the following way:
/** Gets all interpreters **/
public async getInterpreters(resource?: Uri): Promise<PythonEnvironment[]> {
await this.refreshPromise; // Wait for the refresh to finish so that the cache contains all environments.
return getKnownInterpreters(resource); // Return from cache.
}
-
getInterpreterDetails()
is still available as before, and can be used to get complete and reliable information on environments. As it is a blocking call, it's recommend to only use it if complete information is needed, which is generally only the case for selected interpreters.
You can the see the new API in action here in our dynamic quickpick https://github.com/microsoft/vscode-python/pull/17043#issue-975156935.
Note: Any internal interpreter list caching that was recently added should go away once
onDidChangeInterpreters
events are available.
Changes regarding quickpick API
We also used to expose getSuggestions
API for the quickpick, which has the same problem as getInterpreters()
. The following non-blocking API is being added to replace it:
getKnownSuggestions(resource: Resource): Promise<IInterpreterQuickPickItem[]>;
It can be used in combination of refreshPromise
to get all suggestions. Let me know if you have any questions! 😄
@karrtikr just want to double check, this new API won't break existing users, right? If a user is on python insiders, but not updating the jupyter extension, they won't be broken because the old extension is still using the old API?
Yes, the old APIs are still available, the new APIs are added in addition.
FYI we plan to expose discovery API generally for all extensions and not just Jupyter, and it's a little different from this API. Proposed design: https://gist.github.com/karthiknadig/4b7bc9a9420e9071105007b5cc096f65
I'll leave it up-to you guys on whether you want to wait for the final general API. This issue can be closed in that case.
@karrtikr Was looking at this API today, some questions:
- old & new
- When we get the first event old = undefined, and new = A
- Next lets assume somethign changes, then old = A and new = B
- How do I know whether the old value from the second event === new value from first event.
- Are the object refs the same, (looking at the code, they are not)
- Hence its impossible for me to update the objects at my end, i.e. i cannot update the interpreter information as i don't have the original object (all i'm getting from python extension is some new object that says its the original, but its a whole new refernce).
- I.e.there's no identity
Here's the code that shows the objets are re-created everytime for every event
this.changed.fire({
type: event.type,
new: event.new ? convertEnvInfo(event.new) : undefined,
old: event.old ? convertEnvInfo(event.old) : undefined,
resource: event.searchLocation,
});
Internally things are ok for Python extension as the original object refs are stored and updated in place
} else if (seen[event.index] !== undefined) {
const oldEnv = seen[event.index];
seen[event.index] = event.update;
didUpdate.fire({ index: event.index, old: oldEnv, update: event.update });
Solution
- We should pass the same objects around instead of creating them again & again
- I think python extension can easily maintain a weakmap of these ojbects (key = python object & mapped object is the one sent to external users) or other ways...(e.g. regular map, but more expensive)
How do I know whether the old value from the second event === new value from first event.
You could compare the .path
property directly. I think the other properties in old
won't be used anyways, so maybe we should only send the path
as an id instead of the previous environment (denoted by old
).
Internally things are ok for Python extension as the original object refs are stored and updated in place
We maybe doing this but we do not rely on this information and do path check directly, see here: https://github.com/microsoft/vscode-python/blob/ec628195910d7b67fba12ece521d60b75a81fb0f/src/client/pythonEnvironments/base/info/env.ts#L245-L247
You could compare the .path
That wouldn't work as some conda environments. If you create conda environments on unix, the python path is the same as the base conda env.( Unless you provide the python argument in the clip).
Interesting, we're not handling such environments at the moment in the discovery component and completely rely on 'path', but weirdly haven't received any issue reports.
cc @karthiknadig
ndling such environments at the moment in the discovery component and completely rely
Ok, thanks. Weird. I thought we used to detect these. If i'm not mistaken we've been using some internally as well for testing purposes.
Lets leave that for now then, if its not supported then thats a whole separate issue. Hence please ignore the comments, i'll use the path as a key in personal & jupyter extension
@karrtikr just use conda create env -n <name>
Cool.
I thought we used to detect these.
We actually didn't, we detected the environment folders, but we only looked for python paths inside the env folder:
/**
* Return the interpreter's filename for the given environment.
*/
public getInterpreterPath(condaEnvironmentPath: string): string {
// where to find the Python binary within a conda env.
const relativePath = this.platform.isWindows ? 'python.exe' : path.join('bin', 'python');
return path.join(condaEnvironmentPath, relativePath);
}
so such environments with base conda python path were never discovered.
We actually didn't, we detected the environment folders, but we only looked for pytho
Thanks @karrtikr Interesting, @IanMatthewHuff how is it that yo'ure getting Jupyter environments listed in VS Code that don't have an executable? Referring to https://github.com/microsoft/vscode-jupyter/issues/5650
@DonJayamanne Not fully sure what you are referring to. In #5650 I'm just referring to multiple KernelConnectionMetadatas that all share a path.
@karrtikr I'd like to start this discussion again, given that the API has changed Here is what we would like to have
- We would like to popular the list of interpreters in our kernel picker as and when they are discovered.
- E.g. if python extension discovers Conda Env A, then we add
Conda Env A
into the list of kernels, and then addConda Env B
after a few ms or seconds and so on . - Basically we'd like to add the envionments as and when they are added
- We'd like information such as
- Type of environment
- Name of environment
- Python Version of Environment
- Whether its 32 bit or 64 bit
- Please advice if this is possible today with the API or not, and what API methods/events we should be using this
- E.g. if python extension discovers Conda Env A, then we add
- Blocking/non-blocking
- In a separate thread you have indicated that calling some of the API could block the extension code, like language servers etc.
- My suggest would be to block the API to not return anything until all of the necessary components have loaded in the Python extension.
- This way the jupyter extension or others can use the API anytime they need.
- E.g. assume we create a jupyter extension thats very very fast tomorrow, and the code that runs to load interpreters executes pretty much as soon as extension loads, by now Python extension might still be loading. Hence if you would like the Python extension to NOT be blocked by an interpreter API call, then I'd suggest you block the API until the necessary components have loaded in python extension. This allows consumers of the extension API to use the API freely, else its difficult to predict and control how the API is called.
- Also please note, the Jupyter extension needs the list of kernels quickly and we've found that things are slow in the past, hence its likely we'll still continue fetching the interpreters as soon as the jupyter extension loads, so as to improve the over all UX in jupyter extension.
- But this remains to be seen, based on how fast the new API is
- From what I can tell the new API has an event
onDidEnvironmentsChanged
,- Lets assume a user has 10 enviroments and the event is triggered for one of the envirionments.
- Next we call
getEnvironmentDetails
with the above information & we do this for all 10 events triggered. - From what I can tell this is the way the API is meant to be used, I'd like to understand if there are any problems here with this approach.
if python extension discovers Conda Env A, then we add Conda Env A into the list of kernels, and then add Conda Env B after a few ms or seconds and so on .
Regarding first point, yes it's possible. But events are fired if any information is found even though it's not complete yet:
- Event1: EnvA was found (not sure if it's conda yet)
- Event2: Partial version was filled
- Event3: Type detected as conda
- Event4: Complete info is found
Are you interested in information in all these stages, or information only when it's complete? I think for the purposes of populating the list, it's fine to have partial information so users could at least begin selecting it. Both of these are possible to fetch, but you have to call the API differently.
opulating the list, it's fine to have partial information so users could at least begin selecting it
Not sure i agree, displaying the wrong information to the user just because we have something would not be the best. I say wrong, because we could end up displaying an environment as a virtual environment, when in fact its a conda environment or the like, Hence in such cases, its possible the user is looking for a virtual env and we display a conda environment and when they attempt to run it, then things fall over completely (because running conda environments require activation, and now execution fails and they get a lot of errors)
I think for Python extension it doesn't really matter much what the user chooses, in Jupyter it makes a very big difference, basically its a make or break selection. If you get it wrong, things will not work at all, you'll get error in every step (missing dependencies, failure to start, etc), where as in python ext thats not the case, most of the things will work.
Are you interested in information in all these stages, or information only when it's complete?
Basically we need the information before we add it to the list. Lets take the following scenario into account
Event1: EnvA was found (not sure if it's conda yet)
Event2: Partial version was filled
Event3: Type detected as conda
Event4: Complete info is found
Lets assume we called the api method getEnvironmentDetails
after Event2
. Lets assume the envionment is detected as a virtual environment, but in fact its a Conda
enviroment. Now after we get the resutls of the getEnvironmentDetails
, would that point to the stable values? or is it possible that after we get the result for getEnvironmentDetails
the events Event3
, and Event4
can get fired again for the same environment.
Meaning the only time we can get the accurate information is to wait for everything to compete?
Or can we get the accurate information by calling getEnvironmentDetails
and if this is possible would this be faster than having to wait for everything to complete
Two points:
-
Hence in such cases, its possible the user is looking for a virtual env and we display a conda environment and when they attempt to run it, then things fall over completely (because running conda environments require activation, and now execution fails and they get a lot of errors)
Populating the list is different than selecting an interpreter. When user attempts to select an interpreter, that is when
getEnvironmentDetails
can be called with full details. So before running it we know it's conda, and it won't fall over. My point is for purposes of displaying in the list, partial information can be helpful and full env details can only fetched for the selected kernel. -
I say wrong, because we could end up displaying an environment as a virtual environment, when in fact its a conda environment or the like,
Secondly, the partial information displayed won't be wrong, the information only gets more detailed with time. For eg. a conda environment is not a virtual environment, so it'll never be classified as that. Pipenv environments are virtual environments under the hood, so it can be displayed as such until we figure out the better type for the env is pipenv.
Let me know your thoughts, before we move on to how to call the API.
Populating the list is different than selecting an interpreter. When user attempts to select an interpreter, that is when getEnvironmentDetails can be called with full details. So before running it we know it's conda, and it won't fall over. My point is for purposes of displaying in the list, partial information can be helpful and full env details can only fetched for the selected kernel.
I think we're talking about two different things. Lets leave Python Interpreter selection out of this discussion. I'm referring to the kernel picker.
partial information can be helpful and full env details can only fetched for the selected kernel.
I'm referring to the Jupyter extension UI, not Python, I don't think you know how that works as this is not how the Jupyter UI works.
displayed won't be wrong, the information Pipenv environments are virtual environments under the hood, so it can be displayed as such until we figure out the better type for the env is pipenv.so it can be displayed as such until we
In a users views, this information is incorrect hence my point. Yes we can argue about technical details, but as an end user, I'm looking for a Pipenv, not a Virtual Env, hence (Jupyter Extension) cannot necessarily classify these as virtual envs. I.e. we cannot argue such technical details to an end user via a simple UI such as a quick pick.
I think have enough information at this stage, will check the API and get back to you.
I think we're talking about two different things. Lets leave Python Interpreter selection out of this discussion. I'm referring to the kernel picker.
The following running scenario you mentioned comes after selection (picking the kernel):
"and we display a conda environment and when they attempt to run it, then things fall over completely (because running conda environments require activation, and now execution fails and they get a lot of errors)"
I was referring to that saying that won't be a problem.
I think have enough information at this stage, will check the API and get back to you.
Sure, check https://github.com/microsoft/vscode/issues/158246 for latest planned updates.
I was referring to that saying that won't be a problem.
Assume user have a pip env and virtual env, now they select the virtual env assuming its the virtual env they are expecting.
However the virtual env is infact a pipenv (as you mentioend pipenvs can be virtual envs).
Now user selects this and runs the cell.
The packages they installed like ipykernel, numpy, sklearn do not exist in there, and as a result things fall over
and they get a lot of errors.
Based on what you said earlier this is possible.
Any packages we install into it are considering it's Pipenv, and any packages users installed into it outside of VSCode are considering it's Pipenv, so I don't see why packages won't be found. But okay, maybe I still misunderstand so please go ahead.
so I don't see why packages won't be found. B
Because as i mentioned user has two environmetns, pipenv and virtual env. hence packages in virtualenv will not be in pipenv.
@karrtikr Scenario
- User has two python environments A and B
Questions
- How many times can the onDidEnvironmentsChanged get triggered?
- Is it possible for this to get triggered twice, once for A and once for B and then again for A and B with
type = "update"
. - It is my understanding that this is possible, if yes that's fine, as the even arg
type
clearly indicates the type of change.
- Is it possible for this to get triggered twice, once for A and once for B and then again for A and B with
- Assume the discovery runs for 5 minutes (hypothetical, so we can expand the timeline and discuss what happens in the interim)
- Assume we get the first event
type="add"
for envirnmentA
- Now, the Jupyter extension calls the API method
-
getEnvironmentDetails("A", useCache: false
- Would we get the information after the full discovery?
- I.e. is it possibel for us to get the details saying env type =
FOO
, and after the full discovery runs, then env type for A changes fromFOO
toBAR
- In other words after this method returns, is it possible for the event
onDidEnvironmentsChanged
to get fired once again for the same environmentA
(because the env discovery is still in progress) - The same question applies to the API methods
-
- Assume we get the first event
It is my understanding that this is possible, if yes that's fine, as the even arg type clearly indicates the type of change.
Yes that's possible. The add
event should only be triggered once, update
type can be triggered multiple times.
Would we get the information after the full discovery?
Are you asking if the information which is returned will be the same as the stable information after full discovery? Yes. However it won't wait for discovery to finish, so env discovery can be still in progress after this method returns.
In other words after this method returns, is it possible for the event onDidEnvironmentsChanged to get fired once again for the same environment A (because the env discovery is still in progress)
Yes, update
type events can still get fired while discovery is going on. Consider resolving as separate from discovery.
thanks, curious then what is wrong with the jupyter extension calling the getEnvironmentDetails
method for every add event triggered.
If kernel picker cannot work with partial details, there's nothing wrong it. getEnvironmentDetails
returns the full details which is exactly what is needed. However it blocks running the interpreter to get details, so it's slower.
If it was possible for picker to display partial version information etc. and later update it, API offers the flexibility to display the info earlier so it can be selected earlier. We can potentially make it so that kind never changes.
I'm not sure on what is the case so I cannot advise.
We can potentially make it so that kind never changes.
not sure what your mean by this
are you saying that if i call getEnvironmentDetails
once then env type could be Foo
and then call it again then it could be Bar
.
from what i can tell them only way to get the env type is via the above api method.
are you saying that if i call getEnvironmentDetails once then env type could be Foo and then call it again then it could be Bar.
If called with useCache: false
, it returns the final env type and the type never changes.
If called with useCache: true
, it returns partial info, it may return a env type that can change later. The Venv to Pipenv case we talked about earlier. If Jupyter is interested in this option, I can potentially look into improving this scenario, so that env type never changes (we always return the final type). Let me know.
Summary of discussions
Discussion:
- At a minimum Trigger event with stable info for
- name
- type
- id
- Id will be added and that is unique
Proposals:
- onDidChangeEnvironment.xxxx: Event
; type: add, remove, update - No more
clear-all
event - Two versions of
getEnvironmentDetails
, sync and async- getEnvironmentDetails(id:string): Promise<....>
- getEnvironmentDetailsSync(id:string): {id:string; name:string; type: EnvType[];} & Record<string,string>
- Might use progress indicator
- Change ProgressNotificationEvent
- Remove
allPathsDiscovered
- Remove
- Change
getRefreshPromise
togetRefreshPromise(): Promise<void> | undefined;
Remove args - Change
refreshEnvironments
to remove the argument. refreshEnvironments() - Take fix from Jupyter to monitor environments.txt to detect new conda environments
- Detect new conda environments when vs code loads
- Detect PyEnv as well the same way
- Python extension will detect new environments without Jupyter having to call full refresh
- New faster API
getActivatedEnvironmentVariablesxxx(resource: Resource, id)
Closing this as completed, the new API has been completely adopted.
Can we consider https://github.com/microsoft/vscode-python/issues/19989 as unblocked from Jupyter's end, if there's no feedback/concerns?