jupyter_client complete_reply message spec changes: types, values, and signatures

complete_reply message spec changes: types, values, and signatures

Open willwhitney opened this issue 8 years ago • 66 comments

Motivation

With introspection in the kernels, we can create completions that are even more informative than the best static analysis tools in compiled languages. autocomplete-plus in Atom with Facebook's ObjectiveC tools provides a good bar:

autocomplete-plus ObjC completion

These tools are important because they allow our users to see and understand exactly what is going on without guesswork. Instead of requiring the user write some text and ask for an action to be performed on it, the interface should surface information when it is relevant. The user should never have to take an action blind or search for the information they need.

To aid in thinking about this, I knocked together a (highly experimental) implementation with Hydrogen and IJulia, as those are the codebases I know best. This is what the result looks like:

Note the live values in the right column of the variable completions.

Spec changes

To create the user experience we want, we need to change the spec for the complete_reply message to include information about variable types, function signatures, and even the live values of variables.

Presently the complete_reply message takes this form:

content = {
    # The list of all matches to the completion request, such as
    # ['a.isalnum', 'a.isalpha'] for the above example.
    'matches' : list,

    # The range of text that should be replaced by the above matches when a completion is accepted.
    # typically cursor_end is the same as cursor_pos in the request.
    'cursor_start' : int,
    'cursor_end' : int,

    # Information that frontend plugins might use for extra display information about completions.
    'metadata' : dict,

    # status should be 'ok' unless an exception was raised during the request,
    # in which case it should be 'error', along with the usual error message content
    # in other messages.
    'status' : 'ok'
}

Here is my proposal for what the object containing a suggestion should look like:

# Each suggestion is an object of this form
{
    # This field is an enum which describes "what kind of thing" the completed
    # object is. It is useful to have an enum for this to display icons (as 
    # Atom does) or have different behavior for types. Atom uses the 
    # following types:
    # variable, constant, property, value, method, function, class, type,
    # keyword, tag, snippet, import, require
    # I do not feel strongly about the inclusion of this field.
    "suggestion_type": enum,

    # The raw text to insert. Same as the elements of 'matches' now.
    "text": str,

    # The (language-specific) type of the object, as you would get from 
    # 'typeof()' or a comparable function.
    "type": type,

    # If the object being completed has a useful or meaningful value,
    # include it here as it would be given by 'string(x)' or comparable.
    # Things like functions or modules, whose 'string()' looks like 
    # 'function@0xab328', probably shouldn't include this.
    "value": str,

    # If this is a function-like object which may take arguments, this field 
    # should be populated.
    # This is a list to support languages (like Julia) that have multiple 
    # dispatch or via some implementation several signatures for a single 
    # function.
    "signatures": [
        {
            # If this language has known return types, this field should be 
            # populated with the (language-specific) return type of the 
            # function.
            "return_type": type,

            # This should be populated with an ordered list of the function's
            # argument's names and, if available, their (language-specific)
            # types.
            "arguments": [
                {
                    "name": str,
                    "type": type
                },
                {
                    "name": str,
                    "type": type
                }
            ]
        }
    ]
}

In action (Julia) this implementation looks like this for a function:

{
    "suggestion_type": "function",
    "text": "typed_completions",
    "type": "Function",
    "signatures": [
        {
            # a signature with two args
            "arguments": [
                {
                    "name": "text",
                    "type": "String"
                },
                {
                    "name": "pos",
                    "type": "Integer"
                }
            ]
        },
        {
            # and a signature with none
            "arguments": []
        }
    ]
}

and this for an object:

{
    "suggestion_type": "variable",
    "value": "70",
    "text": "seventy_var",
    "type": "Int64"
}

Placement

In my opinion the most "correct" version of the spec including this rich suggestion object would replace the existing matches list of strings with a list of suggestion objects. This would be a backwards-incompatible change, and as such seems more difficult.

We could put these objects in the metadata field (described as "Information that frontend plugins might use for extra display information about completions") and key them on the strings from the matches field. e.g.

content = {
    # these strings are the keys in the metadata dict
    'matches' : ["sweet_func", "seventy_var"],

    'cursor_start' : 2,
    'cursor_end' : 2,

    'metadata' : {
        # these are the strings from 'matches'
        "sweet_func": #<suggestion object of scheme above>,
        "seventy_var": #<suggestion object of scheme above>
    },

    'status' : 'ok'
}

This would be a backwards-compatible change, and as such seems like probably the more reasonable option.

Feedback

Please comment with thoughts, corrections, improvements, etc.

Jul 14 '15 17:07 willwhitney

Defining new top-level keys is not necessarily considered a backward-incompatible change, only changing or removing existing ones. So if you think it's cleaner to add a suggestions list next to the existing matches, that would still be okay for a 5.1 spec.

The gist:

adding metadata doesn't require revving the message spec at all (stays at 5.0, requirement is that metadata is safely ignored and always optional)
adding new keys requires minor rev (5.1, extra stuff, still backward-compatible)
removing, changing keys requires major revision, and should be very infrequent.

Jul 14 '15 21:07 minrk

@minrk Makes sense. Do you have an idea of what the metadata key was meant for, and if this would be a reasonable use of it?

Jul 14 '15 22:07 willwhitney

I think we added the metadata key for exactly this kind of additional info, because some kernels (Haskell, IIRC) had custom frontend stuff to display extra info about completions, and we wanted that to be possible with the message spec.

Jul 14 '15 22:07 takluyver

@gibiansky: Fernando mentioned you would be a good person to loop in on this, since you've done some interesting things with completion in IHaskell.

Likewise @Rickasaurus @prosconi — if you or anyone else relevant from the work on the IfSharp kernel's Intellisense support has any thoughts on what this spec should look like, I'd love to hear them.

Jul 14 '15 22:07 willwhitney

Having the super auto-complete built-in would be awesome. Another field to possibly add is documentation -- a piece of documentation attached to the function, property, field, etc.

Jul 14 '15 22:07 prosconi

@prosconi @willwhitney Yes, this is highly relevant to IHaskell, as the static type system allows you to provide a lot of hints – not only types, but in fact, potential completions for the expressions you are likely to type given the context (that is, multi-token expressions, not just single names).

However I think the best way to allow this is to avoid generalizing. Instead of defining new keys for things like type, argument, signature, etc, just allow the completion mechanism to provide a list of display_data bundles.

Hard-coding things like type, etc, works for Haskell and Julia and probably many other things, but ultimately if we design something like this it will be designed for the current languages. It won't be flexible enough to accommodate everything. For example, ICryptol is a interface for Cryptol, a DSL for verified cryptographic algorithms; Calico is a kernel that allows multiple languages to be active at once; there is a Redis client through the notebook interface, I think. My point is – it will be very hard to design a completion protocol that works everywhere.

Instead, I argue we should reuse something that's tried and true: display_data. For example, I would argue for something like this:

content = {
    # status should be 'ok' unless an exception was raised during the request,
    # in which case it should be 'error', along with the usual error message content
    # in other messages.
    'status' : 'ok'

    # A list of all results
    'matches' : list,
}

each element of matches would be

match = {
    # Multiple ways to display something depending on frontend
    # The key name should mirror whatever exists in the display_data message
    'display_data': ...

    # The range of text that should be replaced by the above matches when a completion is accepted.
    # typically cursor_end is the same as cursor_pos in the request.
    'cursor_start' : int,
    'cursor_end' : int,

    # Replacement text
    'completion': string,

    # Information that frontend plugins might use for extra display information about completions.
    # Do we even need this anymore? I think not.
    'metadata' : dict,
}

Then each language kernel implementor can decide for themselves what information is important to present.

Jul 14 '15 22:07 gibiansky

@prosconi Love the idea of having documentation, and possibly even a docs_link as well.

Jul 14 '15 22:07 willwhitney

@gibiansky an interesting example of something like this is the Atom autocomplete provider api. They impose no semantic meaning on completions, but instead have them specify their the UI itself with keys like displayText, leftLabel, rightLabel, iconHTML, etc.

I think this is an excellent, highly flexible strategy for supporting many languages and use cases if the UI is fixed. We could modify it to be a bit more UI-agnostic by changing those keys to be more like primaryInfo, secondaryInfo, etc.

My main concern with using something like display_data is that we want kernels to be able to provide completions which, across languages and frontends, 1. look acceptable, and 2. are presented in a consistent manner (as far as possible). Giving kernels too much control over the exact appearance of the completion will break frontends that the developer didn't explicitly build against. I'm imagining kernels providing completions as images and SVG and HTML, then not being able to consistently resize them, or change our background colors without clashes.

If we let kernels define our UI, then our UI will never be standardized, work on building that UI will be duplicated for every kernel, and we will have no route to making our UI better.

I think our primary responsibility is to the bulk of our users working in languages where things like types and function signatures are meaningful. For extreme fringe cases like Cryptol, which will never have many users, this spec still works fine for providing plaintext completions.

Organizing these completions semantically lets each frontend (whether it be the Jupyter notebook, or phosphor-notebook, or Hydrogen, or jupyter console) lay things out in the best way for their medium.

Jul 14 '15 23:07 willwhitney

@willwhitney I like the idea of providing a "structure" with a few slots that can be modified (e.g. the primaryKey, secondaryKey, etc idea).

However, I take issue with your suggestion that these "fringe cases" are unimportant. They are unimportant now. However, they may not be in three years!

For example, I have long thought that Jupyter notebook would make an amazing platform for hardware design. This would be completely different from its intended design, and probably won't happen, because Verilog/VHDL synthesizers, routers, placers, etc, are not very open-source friendly (at the moment). However, I can imagine that in two or three or five years, some FPGA company releases an open-source friendly design suite based entirely off of Jupyter. Then a huge industry could potential end up using Jupyter in something you now call a "fringe" language. My main point is: We should not let the current view of the technology landscape cloud our judgment of what could happen. We should design for a very flexible future.

So, please, please, please, don't add semantic information to the completion targets. I'm incredibly happy to have more flexibility and rich interfaces embedded into the completion – multiple columns, scrollable views, keyboard shortcuts, automatic completion instead of requesting it via Tab, etc, are all great. However, we should avoid mixing the UI-side (completion and the way you interact with it and so on) with the kernel side (what those completions mean).

Finally, I don't really support arbitrary display_data in completion; I don't want to see kernels customize everything a ton. However, I want there to be support for

HTML/CSS snippets for styling the completion (colors and fonts, for example). This can be used for many different things: coloring based on the type of completion, based on whether it's a live value or not, etc
Small images embedded in the completions. This is for the same reason – I imagine images could be used to signify meaning, e.g. different icons for different types of completion.
Multiple columns of completion (one column for the variable name, one for type, one for XYZ which we haven't thought of yet, etc)
Allowing a simpler but equally functional presentation in non-notebook frontends via just plain text

Given that set of requirements, it seems like display_data bundles are the obvious choice. And it would be up to the kernels to not screw up too badly and make unusable completions.

I do however support providing a way to let kernels re-use some implementation. Perhaps some CSS classes could be added that the frontend provides, so that display_data bundles can use them.

That said, I only chose display_data bundles because they seem like a simple solution to the requirements I listed above, and because they are incredibly flexible.

Flexibility is important. Please don't link display to semantic meaning. That really encourages making assumptions about the kernels, and kernels are almost guaranteed to exist that break any assumption you make, because the rest of Jupyter is incredibly flexible.

Jul 14 '15 23:07 gibiansky

I'll re-read this in more detail, but in particular for julia (as Will apparently is testing) , can you complere mylist to sorted(mylist) in one tab press ? any way to get sorted(mylist, reverted-|) where | is a cursor ?

I also had a proof of concept, where ,reversed= was a placeholder, in codemirror you could cycle through, and would be stripped at execution if not replaced. which was relatively convenient to insert function(my, many, named, parameter) easily.

Jul 15 '15 02:07 Carreau

cc @dsblank, you may want to weigh in on this one.

Jul 15 '15 13:07 blink1073

I would prefer the data structure to be as UI and language agnostic as possible, but provide rich enough information that the UI can provide a customized experience.

Jul 15 '15 14:07 blink1073

I took a look at Cryptol, and I think it would fit in nicely with @willwhitney's original proposed syntax. I've written VHDL and Verilog, and I see them working just fine as well. We could always expand the spec to accommodate new languages/UIs. All of these descriptors are optional.
I would add an "icon" field, which can be a URL or a MIME object.

Jul 15 '15 14:07 blink1073

+1 on trying to remain UI and language agnostic.

+1 on including a documentation or docs property.

Also, I would suggest renaming arguments to parameters: http://stackoverflow.com/questions/1788923/parameter-vs-argument

Jul 15 '15 15:07 sccolbert

I take back the icon part, that should be up to the UI.

Jul 15 '15 15:07 blink1073

While we are discussing it, is there a solid use case for cursor_end? It seems to just be a confuser: https://github.com/ipython/ipython/pull/7502#issuecomment-70555161

Jul 15 '15 15:07 blink1073

@blink1073 yes, cursor_start and cursor_end describe the selection of text that should be replaced. I know this comes up in IJulia.

Jul 15 '15 17:07 minrk

@Carreau I've definitely been thinking about things like this. Being able to tab through the parameters to enter values is a great feature in IDEs. This API proposal should support that.

Currently the Julia completion does not perform completions of the form mylist -> sorted(mylist), but that should actually be possible under the current API. @blink1073 this is actually the purpose of having both cursor_start and cursor_end — it lets completions replace parts of the text around the cursor. Perhaps they would be better named replacement_start and replacement_end.

Jul 15 '15 18:07 willwhitney

I think cursor_start and cursor_end are actual cursor position(s) when you ask to complete.

One think we might want to think about is multiple selection too, though completion in multiple selection might need to be avoided.

Jul 15 '15 18:07 Carreau

They exist only in complete_reply, not in complete_request: http://ipython.org/ipython-doc/stable/development/messaging.html#completion

The way Atom handles completion in multiple selection is that it only completes on one of them (the last one added, I think).

Jul 15 '15 18:07 willwhitney

The way Atom handles completion in multiple selection is that it only completes on one of them

Interesting. When you accept a completion, does it only insert it in that location, or in all of the cursor locations?

Jul 15 '15 18:07 takluyver

Only the location that showed the completion, unless they're all the same.

Jul 15 '15 19:07 willwhitney

@willwhitney thanks for getting this discussion started.

One of the things we have learned in the past about making changes to the message spec is that really important insight comes from implementing the spec in a second language. Because of this I think it will be important to implement this in a second language before we formally accept the changes to the message spec (Python or R are probably best as we maintain those).

Jul 17 '15 03:07 ellisonbg

A few quick thoughts:

Using the matches as keys in metadata would prevent the use of metadata for any other purpose than providing suggestion.
The proposal for the field "signatures" would need changing to account for languages where functions may return multiple values (e.g. Go).
I think this proposal may step on the functionality provided by inspect_request messages. Wouldn't it possible to achieve the same functionality with a complete_request followed by a series of inspect_request messages?

Jul 17 '15 17:07 n-riesco

@ellisonbg I'll do a Python implementation as well.

Jul 17 '15 21:07 willwhitney

What a great discussion, thanks for getting this going.

Jul 17 '15 21:07 rgbkrk

@n-riesco

Using the matches as keys in metadata would prevent the use of metadata for any other purpose than providing suggestion.

Good point; maybe they should go in a new key instead of in metadata. I have no strong feelings about this, but that makes sense to me.

The proposal for the field "signatures" would need changing to account for languages where functions may return multiple values (e.g. Go).

Right on. This should instead be return_types: [<ordered list of types>]

I think this proposal may step on the functionality provided by inspect_request messages. Wouldn't it possible to achieve the same functionality with a complete_request followed by a series of inspect_request messages?

It could, almost. Unfortunately inspect_reply returns an unstructured display_data mime-bundle. This makes having a uniform UI impossible. Also it's just a drag to fire off half a dozen additional requests, with their round-trip overhead (especially with remote kernels, like with jupyter-hub or tmpnb) when you can know ahead of time exactly what info you're going to want.

Jul 17 '15 21:07 willwhitney

On 17/07/15 22:13, Will Whitney wrote:

I think this proposal may step on the functionality provided by inspect_request messages. Wouldn't it possible to achieve the same functionality with a complete_request followed by a series of inspect_request messages?
It could, almost. Unfortunately |inspect_reply| returns an unstructured |display_data| mime-bundle. This makes having a uniform UI impossible. Also it's just a drag to fire off half a dozen additional requests, with their round-trip overhead (especially with remote kernels, like with |jupyter-hub| or |tmpnb|) when you can know ahead of time exactly what info you're going to want.

Further thoughts on this point:

I agree: firing additional requests complicates the logic in the front-end unnecessarily.
I still see a case for extending inspect_reply to provide the same functionality.

Here's way to achieve this: complete_reply would be extended with a list of inspect_reply messages:

complete_reply:

content = {
    'matches' : list,
    'inspect_replies' : list,

    'cursor_start' : int,
    'cursor_end' : int,

    'metadata' : dict,

    'status' : 'ok'
}

And the definition of inspect_reply would be extended to provide information about the documentation and type of the inspected expression:

inspect_reply:

content = {
# Inspected value formatted as mime-bundle
    'data' : dict,

    # This object would contain most of the information Will suggested in his initial proposal
    'type' : dict,

    # Documentation formatted as a mime-bundle
    'doc' : dict,

    'metadata' : dict,

    'status' : 'ok'
}

Jul 18 '15 07:07 n-riesco

It's also important to keep the computation of each complete_reply responsive, as some frontends will want to do complete-as-you-type, requesting completions quite frequently. This should be taken into account when considering additional information to be included for each completion (docs, current values, etc.). For instance, I imagine a doc link is likely feasible, but actual docs, or the contents of an inspect_reply are likely not.

Jul 19 '15 02:07 minrk

Would it make sense to add 'detail_level' to complete_request?

Or alternatively, limit complete_reply to provide a list of inspect_reply' that one would get using ainspect_request` with 'detail_level' set to 0?

Jul 19 '15 05:07 n-riesco

jupyter_client jupyter_client copied to clipboard

complete_reply message spec changes: types, values, and signatures

Motivation

Spec changes

Placement

Feedback

jupyter_client
jupyter_client copied to clipboard