xmljson icon indicating copy to clipboard operation
xmljson copied to clipboard

Data mismatch

Open jn0 opened this issue 8 years ago • 5 comments

Look at this example:

    from xml.etree.ElementTree import fromstring
    import xmljson, json
    bf=xmljson.BadgerFish(dict_type=xmljson.OrderedDict)
    q=bf.data(fromstring('<a p="1">x<b r="2">y</b>z</a>'))
    print json.dumps(q,indent=2) # note this item ^ (z)!

Output will be:

    {
      "a": {
        "@p": 1, 
        "$": "x", 
        "b": {
          "@r": 2, 
          "$": "y"
        }
      }
    }

Where is z value?

Tested with

  • Python 2.7.12 (default, Nov 19 2016, 06:48:10)
  • IPython 2.4.1 -- An enhanced Interactive Python
  • Ubuntu 16.04.1 LTS (4.4.0-57-generic) x86_64

The xmljson was installed via pip.

I'd expect something like

    {
      "a": {
        "@p": 1, 
        "$": "x", 
        "b": {
          "@r": 2, 
          "$": "y"
        },
        "$$": "z"
      }
    }

jn0 avatar Jan 20 '17 10:01 jn0

Plus, I'd like to preserve XML comments too. Say, under ! "property name" (and "serialize" them the same way: !, !!, !!!, etc):

{
  "!": "comment 1",
  "some": { "more": "JSON here" },
  "!!": "comment 2"
}

jn0 avatar Jan 20 '17 11:01 jn0

@jn0 -- on the comments and text fragments (your "z"), the BadgerFIsh convention is silent. There is a bi-directional extension that uses $1, $2, etc for text fragments and !1, !2, etc for comments -- but this is not backward compatible with BadgerFish.

Also, if we did extend this, I'd like it to also work (to the extent possible) for the other conventions we're implementing -- i.e. GData, Yahoo and Parker.

Any thoughts on how you might structure the JSON attributes for these?

sanand0 avatar Jan 20 '17 11:01 sanand0

@sanand0 not much actually: I'm a newbie here, in XML land :) But $2 and !2 look no worse than $$ and !! (as well as #2 for CDATA). It looks quite obvious to me that loosing parts of the source isn't good enough anyway. Maybe, just add , bidirectional=False to the BadgerFish constructor and act respectively?

The only point is to grab the parts in traditional dict into a tuple for the non-bidirectional mode, I think. This will loose the exact positions, but still preserve values...

jn0 avatar Jan 20 '17 11:01 jn0

Hi @sanand0 !

I have the same problem as @jn0 : some data miss in my json output. I think that there is a problem of recursivity in your parser. Can you have a look on that, please ?

Best, Alexandra

AlexandraBomane avatar Jan 26 '17 16:01 AlexandraBomane

This problem also impacts the Abdera and Cobra conventions I implemented. The problem itself is indicated as a TODO (and commented test) in the tests.

dagwieers avatar May 08 '17 09:05 dagwieers