wal2json icon indicating copy to clipboard operation
wal2json copied to clipboard

JSON column containing object or array of objects gets quoted as a string - breaks parsing downstream

Open dhenson02 opened this issue 2 years ago • 7 comments

Updating a table with JSON columns will treat the row data as a standard string, adding erroneous double-quotes around the value, instead of returning as an array/object.

Example result (from .columns):

{"name":"account_no_history","value":"[{"acc":"3597","createdby":"Viztek,Pacs","createddt":"2015-09-24T17:14:30.728Z"}]"}

This throws when attempting to parse (tried jq and Node)

dhenson02 avatar Dec 12 '21 03:12 dhenson02

I have the same issue from Rust, using the tokio-postgres crate.

For jsonb maps/objects, it's not so big a deal, because I can just reparse the string using standard JSON parsers.

However, that is not the case for arrays! Arrays are stored in some other format, where a single value (without quotes in it) looks like this:

{example}

whereas others look like this:

{"example \"text\" with quotes in it",123123}

This inconsistency is annoying, and makes custom parsing harder.

Prior to an official resolution of this issue, does anyone know the name of the format used above, so I can find a parser for it? (I'd rather not rely on my own attempt to escape quotes and such properly)

Venryx avatar Jan 30 '22 13:01 Venryx

As I don't know the name of the format, I made a quick attempt at writing a parser for it in Rust.

I am very new to Rust, so the only way I was able to get it to compile was by unrolling the end_current_entry function, and changing the current_entry_str variable from a nice Option<String> to the inferior plain String (inferior because the meaning of an empty string is less clear than the None enum).

Anyway, here it is:

use super::type_aliases::JSONValue;

#[cfg(test)]
mod tests {
    use crate::utils::postgres_parsing::parse_postgres_array_as_strings;

    #[test]
    fn simple() {
        let simple_source = "{example}";
        let simple_result = parse_postgres_array_as_strings(simple_source);
        assert_eq!(simple_result, vec!["example"]);
    }
    #[test]
    fn escaped() {
        let escaped_source = r#"{"example \"text\" with quotes in it",123123}"#;
        let escaped_result = parse_postgres_array_as_strings(escaped_source);
        assert_eq!(escaped_result, vec![r#"example "text" with quotes in it"#, "123123"]);
    }
}

/// See: https://github.com/eulerto/wal2json/issues/221#issuecomment-1025143441
/// View the tests above for examples, and intended functionality.
pub fn parse_postgres_array(array_str: &str) -> JSONValue {
    let result_as_strings: Vec<String> = parse_postgres_array_as_strings(array_str);
    let result_as_json_value_strings = result_as_strings.into_iter().map(|a| serde_json::Value::String(a)).collect();
    let result_within_json_array = serde_json::Value::Array(result_as_json_value_strings);
    result_within_json_array
}
pub fn parse_postgres_array_as_strings(array_str: &str) -> Vec<String> {
    let chars_struct = array_str.chars();
    let chars = chars_struct.collect::<Vec<char>>();
    let mut result_as_strings: Vec<String> = vec![];

    let mut in_quote = false;
    let mut in_entry = false;
    let mut last_char_was_escape_backslash = false;
    //let mut current_entry_str: Option<String> = None;
    let mut current_entry_str: String = String::new(); // empty means none

    /*let mut end_current_entry = || {
        result_as_strings.push(current_entry_str.unwrap());
        current_entry_str = None;
        in_quote = false;
        in_entry = false;
    };*/

    //for (i, ch) in chars.enumerate() {
    //let chars_length = chars.into_iter().count();
    let chars_length = chars.len();
    let mut i = 0;
    for ch in chars {
        match last_char_was_escape_backslash {
            true => {
                last_char_was_escape_backslash = false;
                //current_entry_str.unwrap().push(ch);
                current_entry_str.push(ch);
            }
            false => {
                match ch {
                    '{' if i == 0 => {},
                    '}' if i == chars_length - 1 => {
                        //if current_entry_str.is_some() {
                        if current_entry_str.len() > 0 {
                            //end_current_entry();
                            {
                                /*result_as_strings.push(current_entry_str.unwrap());
                                current_entry_str = None;*/
                                result_as_strings.push(current_entry_str);
                                current_entry_str = String::new();
                                in_quote = false;
                                in_entry = false;
                            }
                        }
                    },
                    '\\' => {
                        last_char_was_escape_backslash = true;
                    },
                    '"' => {
                        in_quote = !in_quote;
                        // if just left a quote
                        if !in_quote {
                            //end_current_entry();
                            {
                                result_as_strings.push(current_entry_str);
                                current_entry_str = String::new();
                                in_quote = false;
                                in_entry = false;
                            }
                        }
                    },
                    // ie. if just left a quote
                    ',' if !in_entry => {},
                    // if hit a separator after a non-quoted entry
                    ',' if in_entry && !in_quote => {
                        //end_current_entry();
                        {
                            result_as_strings.push(current_entry_str);
                            current_entry_str = String::new();
                            in_quote = false;
                            in_entry = false;
                        }
                    },
                    _ => {
                        // if hit start of entry
                        //if current_entry_str.is_none() {
                        if current_entry_str.len() == 0 {
                            //current_entry_str = Some(String::new());
                            current_entry_str = String::new();
                            in_entry = true;
                        }
                        current_entry_str.push(ch);
                    }
                };
            },
        };
        i += 1;
    }
    result_as_strings
}

It seems to work for the basic cases I've tried, but I think it's likely there are cases it fails on.

That's part of why it'd be nice to have wal2json emit valid objects/arrays for jsonb fields -- or at least a valid json string, so callers can parse the data without writing brittle parse functions like the above.

Venryx avatar Jan 30 '22 14:01 Venryx

I tried the following example:

CREATE TABLE xyz (
a integer,
b hstore,
c jsonb,
primary key(a)
);

INSERT INTO xyz (a, b, c) VALUES(1, 'a=>x, b=>y'::hstore, '{"name":"account_no_history","value":"[{"acc":"3597","createdby":"Viztek,Pacs","createddt":"2015-09-24T17:14:30.728Z"}]"}'::jsonb);

This example also covers issue #222 .

In format v1, I get:

{
        "change": [
                {
                        "kind": "insert",
                        "schema": "public",
                        "table": "xyz",
                        "columnnames": ["a", "b", "c"],
                        "columntypes": ["int4", "hstore", "jsonb"],
                        "columnvalues": [1, "\"a\"=>\"x\", \"b\"=>\"y\"", "{\"name\": \"account_no_history\", \"value\": [{\"acc\": \"3597\", \"createdby\": \"Viztek,Pacs\", \"createddt\": \"2015-09-24T17:14:30.728Z\"}]}"]
                }
        ]
}

In format v2, I get:

{"action":"B"}
{"action":"I","schema":"public","table":"xyz","columns":[{"name":"a","type":"integer","value":1},{"name":"b","type":"hstore","value":"\"a\"=>\"x\", \"b\"=>\"y\""},{"name":"c","type":"jsonb","value":"{\"name\": \"account_no_history\", \"value\": [{\"acc\": \"3597\", \"createdby\": \"Viztek,Pacs\", \"createddt\": \"2015-09-24T17:14:30.728Z\"}]}"}]}
{"action":"C"}

What is your version?

eulerto avatar Mar 13 '22 15:03 eulerto

We have the same issue with parsing stringified JSON values which makes is a little bit more complex to handle our format before the export.

Is there maybe also a plan to support native native JSON objects for JSON column types?

mertezz avatar Aug 29 '23 20:08 mertezz

Using the v1 format on postgres 15, the output from a jsonb column is correct (though stringified) for me:

create table test(word text, doc jsonb);
commit;
insert into test values ('abc', '[1, 2, null, "abc"]');
commit;

Results in this output

{"change":[{"kind":"insert","schema":"public","table":"test","columnnames":["word","doc"],"columntypes":["text","jsonb"],"columnvalues":["abc","[1, 2, null, \"abc\"]"]}]}

I can run it through jq just fine (first call gets the new jsonb value as string, 2nd call accesses the jsonb value):

$ cat /tmp/wal2json.sample | jq -r '.change[0].columnvalues[1]' | jq
[
  1,
  2,
  null,
  "abc"
]

I used this dockerfile to test:

FROM postgres:15.5-bullseye
RUN apt-get update && apt-get install -y postgresql-15-wal2json
CMD ["postgres", "-c", "wal_level=logical"]

It'd definitely be nice to get jsonb fields "natively" parsed, but I wonder if this ticket should be a feature request instead of a bug?

Edit: Also tested with insert into test values ('abc', '[1, 2, null, "abc", {"x": ["a", "b", 3.1]}]'); to get an object and strings in the json output and it worked correctly again

fovc9 avatar Jan 15 '24 22:01 fovc9

This issue is talking about JSON columns, not JSONB

dhenson02 avatar Jan 17 '24 17:01 dhenson02

Ah sorry about that. Misread the issue then

fovc9 avatar Jan 17 '24 17:01 fovc9