edgedb-cli icon indicating copy to clipboard operation
edgedb-cli copied to clipboard

In repl implement variables of non-string types

Open tailhook opened this issue 4 years ago • 25 comments

Currenly, only strings are accepted

(hint: you can cast strings to other types in the edgeql itself)

tailhook avatar Mar 24 '20 09:03 tailhook

(hint: you can cast strings to other types in the edgeql itself)

Maybe this should be the preferred way of how we implement the feature -- add a compiler flag to accept all arguments as text, basically implicitly compiling SELECT <int32>$0 as SELECT <int32><str>$0

1st1 avatar May 20 '20 05:05 1st1

We already have json_parameters, wouldn't that work?

elprans avatar May 20 '20 06:05 elprans

Another question: how do we input strings? Currently there's no requirement for quotes, e.g.

Parameter <str>$foo> blah blah

translates into "blah blah" string.

This is problematic as it's unclear how to enter an array of strings.

IMO, while it's slightly inconvenient, we should require the user to enter EdgeQL literals, so string/json/datetime quotes should be required.

1st1 avatar May 21 '20 04:05 1st1

IMO, they should be proper literals.

elprans avatar May 21 '20 04:05 elprans

IMO, they should be proper literals.

Yep. And then we can syntax highlight the input.

1st1 avatar May 21 '20 04:05 1st1

That sounds like over-engineering: the primary use case for the feature is to quickly past query from the code an fill in the gaps. It's unlikely that there would be any complex texts that can't be typed directly. If they are in 1% of cases, you can just edit the original query and put complex constant there.

That said, we can do something like this, if you think this is a crucial feature:

Parameter <str>$var (use Ctrl+P for quoted string):

tailhook avatar May 21 '20 08:05 tailhook

Maybe this should be the preferred way of how we implement the feature -- add a compiler flag to accept all arguments as text

As I've said in the zoom, we can have much shorter feedback loop if we parse/validate things directly in repl: we can highlight invalid value as you type, and show error and repeat input right after return key. In case you have a query with more than one parameter, it could be a huge time saver.

tailhook avatar May 21 '20 08:05 tailhook

That sounds like over-engineering: the primary use case for the feature is to quickly past query from the code an fill in the gaps.

Yes, but if the query has an <array<str>>$param it becomes unclear how to enter it. The uncertainty goes away if we stick with EdgeQL literals syntax. It's not over-engineering, it's consistent design.

Parameter $var (use Ctrl+P for quoted string):

Not sure I understand the idea. Do you want to add quotes/escapes once Ctrl+P is pressed? I'm not sure we need this unless a lot of users request that.

As I've said in the zoom, we can have much shorter feedback loop if we parse/validate things directly in repl: we can highlight invalid value as you type, and show error and repeat input right after return key. In case you have a query with more than one parameter, it could be a huge time saver.

Yeah, I hear you loud and clear and I agree. There are upsides and some downsides, mainly:

  1. The implementation is way more complex than simply letting EdgeDB server / Postgres to do the encoding. While this leads to potentially better UX it adds extra code for what is a relatively minor feature.

  2. (and this it the main one) When we have extensions API we will allow extensions to implement their own scalars. This is only a question of "when" because we need to add support for GIS. Extensions will be required to provide JSON/str encoders and decoders for their added scalar types. When this happens, it would be painful if at all possible to implement support for those custom scalars in our REPL.

IMO https://github.com/edgedb/edgedb-cli/issues/15#issuecomment-631242953 is the way to go.

1st1 avatar May 21 '20 18:05 1st1

Yes, but if the query has an <array>$param it becomes unclear how to enter it. The uncertainty goes away if we stick with EdgeQL literals syntax. It's not over-engineering, it's consistent design.

And also convenience goes away with it.

I would go with line per item. With Ctrl+D or alike to end the list.

Parameter $var (use Ctrl+P for quoted string):

Not sure I understand the idea. Do you want to add quotes/escapes once Ctrl+P is pressed? I'm not sure we need this unless a lot of users request that.

Yes, I mean "adding quotes and switch to quoted string version", so you can use escape codes if you need to. And yes, I think this features is not needed for 99% use cases, and is rare enough that you can edit you query if you need escape codes.

The implementation is way more complex than simply letting EdgeDB server / Postgres to do the encoding. While this leads to potentially better UX it adds extra code for what is a relatively minor feature.

The whole point of having REPL is UX. We could go with python -c 'import edgedb; edgedb.connect() otherwise.

So making tool harder to use in 99% use cases is counter-intuitive. You can insert values into placeholders yourself if you have enough time. Editing is even better because you don't have to go up in the history for each variable after you made a typo. No quotes, and immediate feedback means less typos, and means you can test queries from the code right away.

When we have extensions API we will allow extensions to implement their own scalars. This is only a question of "when" because we need to add support for GIS. Extensions will be required to provide JSON/str encoders and decoders for their added scalar types. When this happens, it would be painful if at all possible to implement support for those custom scalars in our REPL.

How do we format those types? How do we support those types in language bindings?

tailhook avatar May 22 '20 14:05 tailhook

And also convenience goes away with it.

Not really, I'm fine with typing proper string literals in REPLs. In EdgeDB repl, in Python repl, etc. I really don't understand why entering specifically string arguments should be any different.

I would go with line per item. With Ctrl+D or alike to end the list.

What if we add two dimensional arrays in EdgeDB 2.0?

The whole point of having REPL is UX. We could go with python -c 'import edgedb; edgedb.connect() otherwise.

UX also benefits from consistency. If I use proper string literals in my queries why would I type them differently when they are arguments? I bet some people would input them as literals and then would be confused why their input is double quoted. Do I need to escape some symbols in this special mode? What if I want to type a regexp? Is my input going to be classified as a raw string or not?

So IMO what you're suggesting is actually bad UX.

No quotes, and immediate feedback means less typos, and means you can test queries from the code right away.

You can provide immediate feedback and highlight syntax with quotes too.

How do we format those types? How do we support those types in language bindings?

We'll likely need to add another option to the protocol specifically for our REPL to render non-native scalars as strings and accept them as strings. Working with them in language binding would require a native extension to that binding.


"Special cases aren't special enough to break the rules." The more we are discussing this the more I believe that having a special input mode for string literals for query arguments would be a mistake.

1st1 avatar May 22 '20 19:05 1st1

Personally, I don't find having to type (single) quotes around a literal inconvenient or even noticeable. Especially so, if the editor adds the closing quote automatically like modern IDEs do.

elprans avatar May 22 '20 20:05 elprans

In general, I agree with @1st1 and @elprans. I'm especially worried about escaping characters in strings as muscle memory will be at odds with what the eyes are seeing in the input prompt. One obvious use of REPL is to test some queries before putting them into code. Testing often implies trying at least one or two "tricky" inputs. So stuff like 'I\'m "happy"' and the like. Copy-pasting these values from code will also be annoyingly messy if the syntax is not consistent with actual literals.

And regexps become a nightmare if escaping is ambiguous (but @1st1 already mentioned regexp).

vpetrovykh avatar May 23 '20 02:05 vpetrovykh

No quotes, and immediate feedback means less typos, and means you can test queries from the code right away.

You can provide immediate feedback and highlight syntax with quotes too.

No, if the only thing that edgedb tells me is "it's a json value" (I can't say its int or a date or a string).

What if we add two dimensional arrays in EdgeDB 2.0?

There are certainly things that are specific for a type. For date/time types we should have something like Ctrl+N for now. For some types, e.g. for bytes, we might always require escaped input.

So we can either make the edgeql-style default. Or JSON-style input, or whatever we decide for that specific type.

Testing often implies trying at least one or two "tricky" inputs. So stuff like 'I'm "happy"' and the like.

I don't understand a use case. 90% times there is an ID and all I do is putting different ids, and other 9% is probably some filter which is perhaps enum. What do you test? That variables properly work or that postgres is able to store a quote in a string?

And regexps become a nightmare if escaping is ambiguous (but @1st1 already mentioned regexp).

Well, there aren't enough use cases for regexps in DB. It's probably going to be used often in constraints. I don't think parametrizing DDL is a common use case.

Even when I'm using regexps in queries (which is very very rare, because there are no indexes for regexps), I don't parametrize them (giving the chance for DB to have compiled version of it cached, I'm not sure if that actually works, though). While I admit, there are still reasons of why it can be parameterized by chance...

Anyway, being "raw by default" is easier for regexps not vice versa. So I don't understand the argument.


Two more things:

  1. If we go with "json_parameters" we:
    1. We don't have type info about parameter (so can't provide a shortcut for datetime_current() for example)
    2. The input syntax is JSON not EdgeQL. E.g. 1_000 will not be supported when we support this in EdgeQL.
    3. We can't switch syntax by some Ctrl + something shortcut
  2. What if we have a type that has quite bad representation in JSON? E.g. if we represent cal::relativedelta as a {"days": 12, "seconds": 120} (which looks quite good for JSON use-cases for me), we can't support nicer 12 days 2 minutes syntax that is a way more human-writable.

At the end of the day:

I'm +1 okay with being able to use EdgeQL syntax for constants I'm -1 to have it default for strings and string-like types like dates (anything having to_xxx(s: str) can accept that format, perhaps everything else might default to EdgeQL syntax) I'm -1 on hiding types from repl behind json_parameters And I'm -0 on supporting JSON syntax for inputting variable at all (even if not default)

tailhook avatar May 25 '20 11:05 tailhook

If we go with "json_parameters" we: We don't have type info about parameter (so can't provide a shortcut for datetime_current() for example)

We can have type info about parameter. We first Parse the query once normally -- we now have a descriptor for query input. All types are known. We then Parse the query again with json_paramaters. Now values can be submitted as JSON but you can do some additional validation/syntax highlight.

Alternatively we can possibly even extend our typedesc (it supports type annotations) so that only one Parse is needed, but I'd keep it simple.

The input syntax is JSON not EdgeQL. E.g. 1_000 will not be supported when we support this in EdgeQL.

That's true and a bit unfortunate. But using JSON is our only safe bet to support future EdgeDB third-party extensions (including our own GIS).

We can't switch syntax by some Ctrl + something shortcut

If you follow the approach with parsing the query two times you can still implement that.

What if we have a type that has quite bad representation in JSON? E.g. if we represent cal::relativedelta as a {"days": 12, "seconds": 120} (which looks quite good for JSON use-cases for me), we can't support nicer 12 days 2 minutes syntax that is a way more human-writable.

Well, again, given that we use JSON as encoding protocol and if we parse the query 2 times we can have alternative syntax mode too, it would be parsed and encoded in JSON for some types, e.g.:

db> SELECT <cal::relativedelta>$foo;

Please provide values for query parameters in JSON. 
Press Crtl+P for alternative input methods.

Parameter <cal::relativedelta>$foo (as JSON) > {"days": 12, "seconds": 120}

or if the use presses Ctrl+P:

db> SELECT <cal::relativedelta>$foo;

Please provide values for query parameters as prompted. 
Press Crtl+P for alternative input methods.

Parameter <cal::relativedelta>$foo (as DAYS/SECONDS) > 12/120

But I'm not sure this all is needed -- this feature is very niche. We want the feature to be extendable in the future, but it's not like we have to invest significant resources to make the best possible UX for it.

1st1 avatar May 26 '20 01:05 1st1

What if we have a type that has quite bad representation in JSON?

We have a rule that every scalar data type must have a valid text representation and hence can be transmitted as a JSON string. This applies to all current and future data types, including those from third party extensions. If we choose to parse input on the client side we will have to force users to make their arguments <str>, which somewhat defeats the point of being able to quickly paste and run a query unmodified.

elprans avatar May 26 '20 02:05 elprans

Having thought about this more and Yuy's idea of doing Parse twice we can actually try and implement a hybrid solution: the CLI can handle all known types as EdgeQL constants (including fancy input like 1_000_000), require text input for unsupported types, and run the query with json_parameters if any text inputs were done. This requires the ability to render known types into text on the part of the CLI.

Ultimately, I agree with Yury. Although this functionality us a nice quality of life improvement, it's not mission-cirtical. Let's just make sure we only accept valid EdgeQL literals as input (i.e. no hacks allowing omission of quotes etc), and revisit this when we have the opportunity.

elprans avatar May 26 '20 02:05 elprans

We can have type info about parameter. We first Parse the query once normally -- we now have a descriptor for query input. All types are known. We then Parse the query again with json_paramaters. Now values can be submitted as JSON but you can do some additional validation/syntax highlight.

That sounds like more work, not less. I.e. with current approach we need text -> native type converter (we need native type -> binary for protocol impl anyway). Now with your approach we need text -> native type -> json conversion.

If we're doing more work, I'd say we may have a partial json_parameters mode that only wraps unknown types (either explicitly selectively or implicitly all unknown types). Not sure how hard is that in edgedb. But technically we can do that in client as we already have query rewrite mechanism in Rust (and we know that query is syntactically correct by first parse).

The good thing about the latter approach is that we can skip it until we implement actual extensions (yes the danger is that extension types will not be supported as the input in old version of CLI, but that's acceptable thing in my opinion).

And yes, I think we should implement packers/unpackers for all our own extensions anyway. Since this is an order of magnitude easier comparing to all other stuff in extension, like figuring out semantics, indexes, an so forth.

I.e. we actually spent more time on continuation prompt (and going to spent more, as second PR is not yet landed), which adds miserable UX difference compared to this.

tailhook avatar May 26 '20 09:05 tailhook

I'd say we may have a partial json_parameters mode that only wraps unknown types (either explicitly selectively or implicitly all unknown types). Not sure how hard is that in edgedb.

-1 on implementing "partial json mode" in the protocol. Again, I stand by my suggestion in https://github.com/edgedb/edgedb-cli/issues/15#issuecomment-633766403 to use 2 prepares.

1st1 avatar Aug 28 '20 18:08 1st1

So basically here are three options:

  1. Accept EdgeDB literals
  2. Accept JSON literals
  3. Have custom input for each type

Basically (1), (3) both require custom input for every type so are generally have the same amount of work.

With (2) There might be more work on conversions text -> native type -> json, but that only for each type we want to have nicer editing support (i.e. validation, insert "now" for date, or underscores in numbers), so I'm not sure what the overall difference is.

So what I think, sooner or later we want the following:

  1. Native literals, but not default for simple types like numbers and strings
  2. Copy-pasteable JSON into variable (so extra quoting is harder)
  3. Validation and hinting
  4. In studio we also want datepicker for dates and copy-pasteable strings, so converting for JSON will not help studio.

And yes, for unknown types two Prepare approach + rewriting query (<custom_type>$x becomes <custom_type>(<json>$x)) doesn't sound too bad for me.

Afterthought: for implementing native literals we probably can also use rewriting query approach (and rely on constant extraction for optimization), but I'm not sure this is a great idea.

tailhook avatar Sep 01 '20 09:09 tailhook

Native literals, but not default for simple types like numbers and strings

I still think that strings should require explicit quotes, e.g.

$foo > 'aaa'

Basically (1), (3) both require custom input for every type so are generally have the same amount of work.

Not for every type, but the most common ones. Although I think this is such a minor feature and it's an open question what percentage of users will ever even discover it. I wouldn't even mind if we explicitly prompted users to enter JSON literals and be done with this.

1st1 avatar Sep 01 '20 17:09 1st1

Native literals, but not default for simple types like numbers and strings

I still think that strings should require explicit quotes, e.g.

Like I've already said:

  1. This makes it harder to paste something (i.e. should I open single or double quote before pasting? Escaping?)
  2. This requires client-side parser anyway (see below)

Basically (1), (3) both require custom input for every type so are generally have the same amount of work.

Not for every type, but the most common ones. Although I think this is such a minor feature and it's an open question what percentage of users will ever even discover it. I wouldn't even mind if we explicitly prompted users to enter JSON literals and be done with this.

How do you expect this to work if only "most common ones"? E.g. while datetime literal is just '2019-07-05Т01:01:01', the actual thing we must send is milliseconds since epoch. So there should be not just parser of the string literal but the parser of the actual date, regardless of whether we ask user to include quotes or not.

tailhook avatar Nov 02 '20 13:11 tailhook

So basically here are three options: Accept EdgeDB literals

I think this is the best and most consistent approach. I don't think it would be too much trouble for the CLI (and other frontends) to render numerics and booleans into JSON and assume that everything else is a string literal representing a valid type rendering (no interpretation necessary).

elprans avatar Mar 10 '21 22:03 elprans

Oh, and since we know the type of the argument, we don't have to force users to enter quotes around the value, although it's unclear what to do if they do enter quotes, i.e do we double-quote them or assume the value as-is?

elprans avatar Mar 10 '21 23:03 elprans

unclear what to do if they do enter quotes, i.e do we double-quote them or assume the value as-is?

This is why I don't like the idea of "simplifying" the string literal input. IMO we should require quotes always, by default. We can expose a REPL config setting to disable it though.

1st1 avatar Mar 10 '21 23:03 1st1

This is why I don't like the idea of "simplifying" the string literal input.

Well, the motivation for this feature is user convenience, and not having to type quotes is a nice QOL improvement. I think we can detect if the passed value is already quoted (starts with ' or ") and not quote it. If you really need to send '"foo"' (nested quotes) you can still do it by typing all those quotes literally.

Think about it this way: you don't enter string values into forms in quotes, and, well, this prompt is sort-of a form (and will be a form in GUI).

elprans avatar Mar 11 '21 00:03 elprans