core_d.js icon indicating copy to clipboard operation
core_d.js copied to clipboard

TCP arguments protocol is lacking

Open resolritter opened this issue 4 years ago • 8 comments
trafficstars

The implemented TCP protocol is

$TOKEN $PWD [...args]

The most glaring problem there is that $PWD can have spaces and that is not taken into account.

Further, I saw that the protocol accepts JSON although it's not documented.

https://github.com/mantoni/core_d.js/blob/bfe1f7c5d5211f006e13217e8f436f507a5da450/lib/server.js#L25-L26

Yet another problem there: text is supposed to be included in the JSON payload, but it shouldn't; encoding thousands of lines to JSON is costly and there's no reason for it.

My suggestion is the following:

  • First line should have all the parameters
  • Second line and below is the text

i.e. parseData should be like

function parseData(data) {
  const newlineIndex = data.indexOf("\n")
  const payload = JSON.parse(data.slice(0, newlineIndex))
  const text = data.slice(newlineIndex + 1)
  return {
    payload,
    text,
  }
}

for the format

<payload>
<text>

as an example:

{ "cwd": "foo", "args": ["bar"] }
function sayHello() { console.log("hello") }
sayHello()

Summary of the proposal

  • Deprecate space-delimited protocol
  • Implement new protocol
    • All options should be as JSON object in the first line
    • Second line and below is reserved for text

Even in Bash it's pretty easy to escape the cwd for JSON, which would get rid of the spacing problem. Here's how I'm doing it in a script:

# escape a character of choice with '\'; outputs the result to $out
escape_string() {
  v="$1"
  c="$2"
  len=${#v}
  for ((i=0; i<$len; i++)); do
    if [ "${v:$i:1}" = "$c" ]; then
      v="${v:0:$i}\\${c}${v:$(( $i + 1 ))}"
      len=$(( $len + 1 ))
      i=$(( $i + 1 ))
    fi
  done
  out="$v"
}

cwd="$(dirname "$1")"
# escape all double quotes since they're used as delimiters in JSON
escape_string "$cwd" '"'
encoded_cwd="$out"

msg="{ \"cwd\": \"$encoded_cwd\", \"args\": [\"--stdin\"] }"

Upsides for JSON:

  • You can support any other options in the future without messing with the encoding
  • It's ubiquituous
  • Reasonably human-readable for a single line

resolritter avatar Apr 16 '21 19:04 resolritter

Thank you for raising this. You're right that it's unnecessary and inefficient to encode the text in JSON. Since this would be a breaking change, I'd like to take a step back and think about these alternatives:

1️⃣ We could get away with newline delimited plain text, like this:

<cwd>
<args>
<text>

The question is whether we'd encode args as JSON or leave that to clients using this library and pass the string as is.

2️⃣ We could skip parsing the payload entirely and just pass it on to service.invoke. This library doesn't need to understand the data at all.

What do you think?

mantoni avatar Apr 22 '21 13:04 mantoni

:one: Not encoding args as JSON has the same whitespace problem, unless you want to separate each in a line. Simplest thing I can think of is

<cwd>
<number of args>
<arg1>
<arg2>
...
<text>

e.g. 3 lines of arguments:

/home/user/project
3
--stdin
--lint
--configurationPath=/home/directory 1/directory 2/file
text

or for no arguments

/home/project
0
text

Looks somewhat janky, but you're right that it doesn't require encoding it as JSON, so that's a big plus.

:two: seems like a good idea, although I wonder what's the point of the TCP connection then? Simply skipping file creation in the disk? Yeah it makes sense, have core_d only act as a "forwarder" to the service.

resolritter avatar Apr 23 '21 15:04 resolritter

Option 2️⃣ is becoming my preference. I think of core_d as a convenience utility that allows me to safely start and stop a daemon process for something like eslint_d. It doesn't need to specify a wire protocol really. Transporting the cwd, "some args" and "some body" should be good enough.

Would you be interested in contributing a change for this?

mantoni avatar Apr 23 '21 16:04 mantoni

I am perhaps sensing some confusion here, so let's confirm the understanding first. My understanding is as follows:

Option 1

core_d adjusts the parsing

Protocol

<cwd>
<number of args>
<arg1>
<arg2>
...
<text>

e.g. 3 lines of arguments:

/home/user/project
3
--stdin
--lint
--configurationPath=/home/user/directory 2/file
text

or for no arguments

/home/project
0
text

Service

Stays the same since core_d is still doing the parsing.

function (cwd, args, text, mtime, cb) {
  // ...
}

Option 2

core_d gets rid of the parsing and simply sends the raw data into the service

Protocol

<raw_data>

Service

Dictates the protocol and parses it however it wants

function (raw_data, mtime, cb) {
  const { cwd, args, text, ...other } = parse(raw_data)
  // ...
}

Is that what you're thinking as well?

Also let's remember why JSON was proposed in the first place: it gets rid of whitespace ambiguity for separating the components. Separating them into newlines achieves the same.

Transporting the cwd, "some args" and "some body" should be good enough.

If you want any upfront parsing to remain before sending data to the service, then that's not the "Option 2" I am thinking of.

resolritter avatar Apr 23 '21 23:04 resolritter

I see. My proposal is a solution that is a mix of both. I'd change the protocol to be:

<cwd>
<args>
<body>

The first line is the current working directory, as you stated, we would get rid of the whitespace problem.

The second line would be arguments, can be blank, and it's up to the service to parse it. It could be JSON, but core_d would not try to parse it.

The remaining text is the body, also passed on without looking at it.

Does that make sense? Are am I missing something?

mantoni avatar Apr 24 '21 10:04 mantoni

I understand your idea now. It's similar to the one originally proposed in https://github.com/mantoni/core_d.js/issues/11?_pjax=%23js-repo-pjax-container#issue-860097788, laid out again for comparison:

Protocol

<payload>
<text>

example:

{ "cwd": "foo", "args": ["bar"] }
function sayHello() { console.log("hello") }
sayHello()

I don't get why cwd gets its own line in your idea, though. Why not encode it in the payload line too like shown above? Well, either way is fine, I suppose.


This did give me an idea for an Option 3 which is a simplification of Option 1 from https://github.com/mantoni/core_d.js/issues/11?_pjax=%23js-repo-pjax-container#issuecomment-825984746.

Option 3

First line tells to expect N next argument lines, followed by N argument lines, then the text comes next. It would get rid of the <cwd> line from Option 1 because it can also be put in the arguments.

Protocol

<number of args>
<arg1>
<arg2>
...
<text>

example (note --cwd=)

2
--cwd=/path/to/directory
--lint
console.log("foo")

or for no arguments

0
console.log("foo")

Service

function (argsLines, mtime, cb) {
  const { cwd, lint, ...other } = parse(argsLines)
  // ...
}

I like Option 3 better since not every service is obligated to use a cwd. It's not limiting because "argument lines" can have whatever the service wants - core_d would not look at it, only split the lines before sending it to the service. If the service wants, It could even be:

2
/path/to/directory
--lint --check --fast
console.log("hi")

i.e. it would allow for your idea as well.

I like this Option 3 the most, although I can also accept Option 2 from https://github.com/mantoni/core_d.js/issues/11?_pjax=%23js-repo-pjax-container#issuecomment-825984746. The other ones are not as "clean" as those IMO.

resolritter avatar Apr 25 '21 00:04 resolritter

I like where this collaborative idea development is going 👍

I'd like to keep the cwd an explicit part of the protocol because eslint_d sets it itself.

I like the idea of placing the additional arguments on separate lines. That seems cleaner to me too. However, I'm not a fan of specifying the count of argument lines upfront. Could we separate the arguments from the body with a blank line maybe?

<cwd>
<arg1>
<arg2>

<body>

mantoni avatar Apr 25 '21 06:04 mantoni

Yeah, separating them with a blank line looks better than specifying the line count. I'd agree with that format as well.

resolritter avatar Apr 25 '21 07:04 resolritter