jansson icon indicating copy to clipboard operation
jansson copied to clipboard

How to diffrentiate empty or whitespace-only input from errornous data?

Open milasudril opened this issue 7 years ago • 5 comments

json_error_t status;
m_handle=json_load_callback(loadCallback,&readhandler,0,&status);
if(m_handle==nullptr)
	{
	if(status.position!=0)
		{
		exceptionRaise(ErrorMessage("#0;:#1;: error: #2;."
			,{readhandler.nameGet(),status.line,status.text}));
		}
	m_handle=json_object();
	}

This works as long as the input is truly empty, but it fails for whitespace-only data. Currently, data is passed to libjansson as if

grep '^[[:space]:]*//@' "$READHANDLER_NAME" | sed 's/\/\/@//'

This works perfectly as long as the data is at beginning of the file. Otherwise, parse errors are reported at the wrong line. Within the current framework, the cleanest solution is to pass empty lines to libjansson so that it is internal linecounter keeps incrementing. But with the current logic, an exception will be thrown ("Expected '[', '{' at EOF"). What is really needed here is an error code system, or can I rely on

if(strcmp("Expected '[', '{' at EOF",status.text))
       {exceptionRaise(...);}

milasudril avatar Apr 26 '17 08:04 milasudril

sed 's/\/\/@//'

What are you trying to achieve with this sed call? From Maike issue #40 it looks like you might want to extract those portions of text from a file that are in a line of their own except for a marker that consists of line-comment markers from usual programming languages (like # or //), followed immediately by an @ sign. The portion to be extracted is the parts right from that marker. A sed invocation for that sort of filter could be:

sed -nre 's~^\s*(//|#)@~~p' -- gm_programs.py

The -n option and p(rint) flag to the s(ubstitution) replace the grep filter. (Update: If you want to keep empty lines, your grep or an equivalent filter will break it. I'll post a better solution in a few minutes.) (Update 2: I'll post it on the maike thread because it's not really about jansson.)

If you want to only get lines that have a non-whitespace character right of the @ sign, you could pipe the above to | grep -Pe '\S' or extend the sed regexp to:

sed -nre 's~^\s*(//|#)@(.*\S)~\2~p' -- gm_programs.py

While the latter is short, it might become buggy on some version of sed if input lines are very long. Also if you add parens to the comment markup match, you'd have to adjust the reference number on the right. Since without the N command (append next line) your input buffer will never start with a newline, we can use an initial newline as a marker for interesting lines, so we can do more s~~~ubstitutions before we print:

sed -nre '
  # mark interesting lines, remove comment markup:
  s~^\s*(//|#)@~\n~
  # trim end-of-line whitespace, and our marker if
  # nothing visible remains in that line:
  s~\s+$~~
  # if the line still is marked, remove our marker and print:
  s~^\n~~p
  ' -- gm_programs.py

or condensed:

sed -nre 's~^\s*(//|#)@~\n~;s~\s+$~~;s~^\n~~p' -- gm_programs.py

I prefer selective printing over deleting of lines because d's control flow behavior changes depending on whether the -n option is present.

mk-pmb avatar Sep 02 '17 14:09 mk-pmb

That said, empty or whitespace-only input is not a valid JSON value, so for a compliant JSON parser, empty and whitespace-only inputs are a true subset of errornous data.

mk-pmb avatar Sep 02 '17 14:09 mk-pmb

@mk-pmb I remove the comment mark on each line. Notice that they have not to appear at beginning of line. My sed was the simplest possible, but does not cover every case.

So JSON cannot be empty. That is interesting. I would have distinguished empty from error, because my situation is:

  • Empty => Nothing to do
  • Syntax error => Throw an error

As it is now, I can identify empty by looking at the offset where jansson complains (it must be zero), but this does not hold for whitespace only data, which is what the filter spits out if there are no marked lines. So this means I have to write a custom parser that separates these cases?

milasudril avatar Sep 02 '17 15:09 milasudril

It would probably be a nice feature of jansson if it could report the special error cases "empty" and "whitespace only". Until it does, you'll probably have to check that yourself.

mk-pmb avatar Sep 02 '17 15:09 mk-pmb

It appears that adding that feature would break the ABI. Currently, it looks like

struct json_error_t{
      const char* text;
      const char* source;
      int line;
      int column;
      int position;
 };

This feature request would replace the first field of this struct an enum containing integer values for each possible value. And then there would be a function that converts the status code (JSON_NO_ERROR,JSON_EMPTY,... ) to a message. Thus it breaks the ABI :-( . Other solutions include a second set of json_load_* functions which takes an json_error2_t.

milasudril avatar Sep 02 '17 16:09 milasudril