robotframework icon indicating copy to clipboard operation
robotframework copied to clipboard

Support JSON output files as part of execution

Open teaglebuilt opened this issue 5 years ago • 48 comments

Problem:

Prefer to use a NOSQL database to store test reports. NOSQL databases typically require json format because its a relational database.

Solution:

Provide robotframework with the ability to use json rather than xml in the output file.

For example:

           # flag --output
          robot --output output.json ./tests

teaglebuilt avatar Dec 21 '19 17:12 teaglebuilt

If its ok with the community, I would like to work on this. I created a pull request for tracking progress and questions along the way

teaglebuilt avatar Dec 21 '19 17:12 teaglebuilt

Being able to use output.json instead of output.xml may be useful. Biggest possible benefits I see are:

  • JSON is a lot less verbose than XML, so output.json file would be a lot smaller than output.xml.
  • Because output.json would be smaller, writing it to disk during execution would be faster as well.
  • It is possible that processing output.json would be faster than processing output.xml. This is not certain, though, as our XML processing code is pretty well optimized and the underlying XML modules are pretty fast.

If processing output.json is faster than processing output.xml, then this feature is definitely a good idea. If it isn't, then we need to think are the other benefits big enough. This is something we can only know after prototyping this a bit.

I don't thing possibility to add data to a NOSQL database or to any other system is a good reason to add anything like this. It is already now possible to do that by parsing the information from output.xml using the standard XML modules or the robot.api.ExecutionResult API. Data being stored in JSON instead of XML doesn't really make a big difference.

pekkaklarck avatar Dec 21 '19 18:12 pekkaklarck

The are two design decisions to be made:

  1. How to enable this feature. I don't think any new command line option is needed. We just need to make it possible to say --output output.json and detect the type automatically. Also Rebot needs to be enhanced to allow using rebot output.json.

  2. The output.json format. Because it would be a new external interface, it needs to be designed carefully. No need to think about that too much in the beginning, though. It is more important to test are there performance benefits or not. If there are, then we need to design the format more carefully.

pekkaklarck avatar Dec 21 '19 18:12 pekkaklarck

There is also another option to reduce output size: write directly to tar.gz. I think both reading and writing of output.xml would be fairly easy to modify to also being able to read and write (the same xml) to output.tar.gz

mkorpela avatar Dec 22 '19 08:12 mkorpela

https://docs.python.org/3/library/tarfile.html (Zip and others also could work)

mkorpela avatar Dec 22 '19 08:12 mkorpela

@mkorpela one reason why i am set on an json option is the possibilities of using NoSQL databases to easily store robotframework results.

teaglebuilt avatar Dec 23 '19 15:12 teaglebuilt

Can you @teaglebuilt explain how this feature would help storing results in a db?

pekkaklarck avatar Dec 23 '19 19:12 pekkaklarck

@pekkaklarck JSON is the query language used in NOSQL databases.

Well, MongoDB for example, is technically BSON format which is relative to JSON. from what ive seen there are many complications to storing xml in a NOSQL database. These are some examples below,

Mongodb Cassandra Couch DB

They all require xml to be converted into JSON for storage. Unless you want to just save as a string which would not be able to query

teaglebuilt avatar Dec 24 '19 21:12 teaglebuilt

@pekkaklarck with the possibility of output.json, we could essentially use NoSQL databases to store and query results. Then look into other possibilities of filtering and searching specific queries relative to the data in the output file

teaglebuilt avatar Dec 24 '19 21:12 teaglebuilt

STATUS UPDATE: working on JSONELEMENTHANDLER, rebot accepting endswith(.json) on output file....

teaglebuilt avatar Dec 29 '19 20:12 teaglebuilt

  1. I still don't understand how Robot producing json output itself helps using any db. It is already now easy to read information from output.xml and write it to a db (or to a json file). If you have a db that accepts Robot's output.json as-is, then you avoid one step between test run and storing info to the db. Is that your motivation?

  2. Great that you have some progress!

  3. Have you already thought about the json format?

pekkaklarck avatar Dec 29 '19 22:12 pekkaklarck

@teaglebuilt could you share the JSON format you're looking to produce? I am also under a similar situation where I want to get the output as JSON format.

Parsing XML and converting that to JSON I found to be very quick and also the XML gets converted into python objects and then I can use it like suite.suites and so on.

So few things come to mind:

  1. Can you share the JSON structure you're looking to output

  2. Also, when I read the JSON output back into python for further processing, will it be converted into python objects?

I was looking at this for JSON structure and JSON to python objects:

  1. https://json-schema.org/understanding-json-schema/
  2. https://pypi.org/project/python-jsonschema-objects/0.0.13/

Also, please let me know if i can help.

sdave2 avatar Jan 07 '20 16:01 sdave2

So I've taken a fork and created a JSON Schema and added a "--json" option to rebot so you can convert your XML results into JSON.

There are a couple of things which I have done which are probably not consistent with robot.

  • In the JSON schema I have named what robot normally labels as "kw" as "keyword" because for me it's easier for other people to understand if it's spelt out fully. I have done this with "doc" to "documentation" also.
  • In situations where it seemed appropiate to use a boolean I used a boolean. This is rather than "yes"/"no" for marking stats as critical / marking stats as "html" or not.
  • I have not fully broken down the objects in the schema. JSON schemas have the idea of inheritance, and quite a few of the objects share some of the same fields, but I thought they were encapsulated in a way which was easy to understand.
  • My python implementation is a little lazy. I haven't written documentation on my functions, and have strayed away from putting individual comments on lines (which I normally do, but was worried that the coding guidelines for robotframework wouldn't allow that). I also haven't written any acceptance tests. I plan on doing this, if there is interest in merging it in.
  • Most of my schema was derived from the XSD. I could be wrong, but based off of my interactions with robot objects, and the XSD, there is a mismatch between the two. I think my JSON schema corrects some of those wrongs, but if there's a difference (which to me seems likely) then the supporting code would need to be changed.
  • I put the generated, generator, and rpa tags in the JSON schema, but I couldn't quite figure out exactly where I could get that data from in the returned objects.

I appreciate that it's been said that this could be done as part of the robot output. I don't think it'd be hard to adapt what I've done to that. Having said that though, I figured that rather than attaching the JSON to the robot implementation it was better as rebot. If you want the format in something else (i.e: Xunit) you simply use rebot to fix it up.

I'm quite new to github, and haven't done any contribution before now. So there maybe some rules which I'm not so aware of, so if I'm doing something stupid please let me know.

Lemonlemmings avatar Feb 03 '20 20:02 Lemonlemmings

Quick comments:

  • I agree keyword is more explicit than kw, but if it is repeated huge amount of times then saving 5 characters per usage can be quite a lot. One idea of the JSON output would be making the output file smaller.
  • Using Boolean values and not "yes" and "no" is definitely a good idea.
  • In the end JSON output should be activated simply by using --output output.json which should then also disable writing the normal output.xml flle. It should work both with robot and rebot.
  • This is such big change and RF 3.2 is so close that this can be earliest included to RF 3.3 (which may actually be RF 4.0).
  • The same schema and lot of the code could be reused to create a tool that can convert current ouput.xml files to output.json (and possibly also the other way around). That would be a great way to make sure the schema works well as it could be fine tuned freely. Once it's taken into use by RF itself, then all changes are subject to backwards compatibility consideration.

pekkaklarck avatar Feb 06 '20 10:02 pekkaklarck

Okay, so I have updated my code. Robot will now output a .json file.

  • I have changed the properties back to kw and doc to be consistent with how robot is elsewhere. I've updated the schema to reflect these changes as well as the underlying code.
  • I've updated the JSON writer to not be dependent on requiring the entire output upfront (i.e: Moving away from a writer to the logger format). This allows robot to recognise if the output ends in .json and will output the results in JSON format. I've tested this against a couple of inputs and it looks good on the whole.
  • I'm in the middle of writing a JsonExecutionResultBuilder so that rebot can accept .json files in the schema format and make them available to as an ExecutionResult. If I'm correct then this will allow rebot to accept a mix of JSON and XML to merge. Hopefully this will also satisfy your requirement for a tool to be able to convert between the XML and JSON (as rebot will do this).

Should I open a pull request so you can track the changes, as I'm also renaming some functions (so robot can remain agnostic to what output format is chosen).

Lemonlemmings avatar Feb 09 '20 17:02 Lemonlemmings

Another update, rebot is now able to accept the .json format as an input. This allows it to combine with either other XML files, or other JSON files. It can produce either XML output, or JSON output. I've tested it against a couple of examples so far and it seems to work fine.

  • It needs a little tidying still. I know robot prefers classes for the element handlers (which I have not done on my implementation).
  • It needs some unit tests and acceptance tests writing still.
  • I haven't run this on python2 yet. My expectation is that it should work, but I haven't gotten round to installing python2 to check yet. I've noted that python2 support is being dropped anyway for RF 4.0, so this might not be a big deal.

Lemonlemmings avatar Feb 10 '20 00:02 Lemonlemmings

Sounds great! Quick comments from me again:

  • A PR showing the code would be good. Doesn't need to be fully ready.
  • Example of the JSON output would be interesting. Perhaps a gist?
  • Have you done any performance measurements? I'm especially interested in the output.json processing speed.
  • Could you please inform people on the #devel channel in our Slack about this? I'm sure others would be interested. This would require a PR so that they can test it.
  • If changes are relatively small and benefits are big enough, we could possibly still get this into RF 3.2.

pekkaklarck avatar Feb 16 '20 19:02 pekkaklarck

Ping @Muusssi!

pekkaklarck avatar Feb 16 '20 19:02 pekkaklarck

Unfortunately I haven't found the time to progress this forward more.

  • I have opened a pull request: pull request
  • I have made a gist with the output of what I've been testing: JSON gist
  • I am interested in the speeds, but I haven't gotten around to doing this yet. I'll have a look this weekend.
  • I've not used Slack before, I'll look at how to do this at the weekend 😄
  • It would be nice to see if we could get it into RF 3.2

Lemonlemmings avatar Feb 20 '20 21:02 Lemonlemmings

I like the idea to generate output in json format . i have similarly situation where i want output in json format so that i can upload test result in other external tracking system. May i request you to please update the RF3.2 release date.

NarendraSingh727 avatar Mar 30 '20 05:03 NarendraSingh727

RF 3.2 will be released very soon. This issue needs to wait for future versions.

pekkaklarck avatar Mar 30 '20 21:03 pekkaklarck

I've been looking at this again today, I've updated it so the unit tests pass, but I'm suffering from encoding issues with strings for acceptance testing.

Lemonlemmings avatar Mar 30 '20 22:03 Lemonlemmings

Sorry it's been so long since my last update. COVID-19 has only made me more busy (as I imagine it has many software engineers). I've come back and had a look at my code and realised it was left in a bit of a state (it would produce various different errors, and wasn't properly forming the models with the element handlers).

I've resolved these issues and I have been running some tests with rebot and found that I can convert freely between XML and JSON using rebot, and that the data trasformation between the two is consistent!! (Hooray) I have also ensured that the output from the JSON validates against the schema! (Hooray again).

I have also done some testing with timings, although these have been very limited tests. I was noticing, with a fairly basic robot file, that XML was marginally quicker.

XML: 0.033s JSON: 0.036s

I don't know if this changes with larger robot files. I imagine this might change based off of the kind of hard disk you have available (I have an NVMe SSD) as the JSON implementation will store the results in memory and will write to disk when it's complete, rather than the XML implementation which looks like it writes to disk as it goes. This implementation is unfortunate as it makes the JSON implementation harder to do with the visitor pattern because you have to remember "where you are" between visits.

I'm increasingly finding that my implementation of the JSON is stable, but the other surrounding bits (like recognising the output file ends in ".json", and the addition of the "--json") don't quite work how I'd expect them to. So my next task is to ensure that the flows are working how I'd expect them to be. Afterwards I'll look to implement the other options that rebot supplies as it looks like some of the options around flattening are completed when the XML is loaded and is something I'll have to in the loading of the JSON. All in all, seems to be coming along nicely!

I have two questions.

In my pull request, I'm getting some failures that I wouldn't expect to be getting based off of the code I've changed. Are there known issues with the existing set of acceptance tests, or have I done something wrong (such that I'd be getting failures with the help lines)?

I'm getting failures in Python2 acceptance tests because there are a whole load of string classes robot has for dealing with unicode strings in Python2 and this grinds against my implementation. For recognition of the output ending in ".json" I have implemented a line of code like so: output.lower().endswith(".json"). This stops working with the custom string classes because they don't have these functions available. These custom string classes appear to only be in python2 implementations because I only get the error here, not in python3 acceptance tests. Is this something I need to care about because python2 is now deprecated, or should I go to the effort of finding a way around this problem?

Edit:

Third question. Has anyone had the time to look over the JSON schema? I think it's good and reflects what I'd expect robot to look like in JSON, but obviously this is an important part of the implementation.

Lemonlemmings avatar Jun 14 '20 15:06 Lemonlemmings

I've just run an example on my laptop (with a SATA SSD). This is across 20 runs of the same robot script:

JSON = 3.6550284 XML = 3.6984686

My laptop has slower specs, but I think the notable difference is that it has a SATA SSD rather than an NVMe drive which benefits the performance of the JSON implementation.

This is the example robot file (simple): https://gist.github.com/Lemonlemmings/c6b33ea88e68327cc63afd5c6c793f26

Since it's been a while here is the outputted JSON from that run: https://gist.github.com/Lemonlemmings/cc92e26f3edd02ae40d700352ef63e77

Lemonlemmings avatar Jun 14 '20 15:06 Lemonlemmings

Thank you for adding this feature , and expecting it to be released soon as I got a requirement to have the output.xml converted to output.json, Actually I'm trying to develop the code for this conversion and facing difficulties and when I googled for solution I ended up in this link and happy to see such feature is being added, Expecting the release soon and have to thank and Congratulate this team for the good effort, Great work guys..!

Murugan-Ramaswamy avatar Jun 30 '20 12:06 Murugan-Ramaswamy

To increase the parsing speed - maybe try other parsers? https://github.com/ultrajson/ultrajson or https://github.com/ijl/orjson. I think I can check the speed of different parsers on our large output.xml files.

skhomuti avatar Jul 10 '20 09:07 skhomuti

Related https://en.wikipedia.org/wiki/JSON_streaming

mkorpela avatar Nov 01 '20 17:11 mkorpela

I took a look at the json PR. There are several things to consider.

I would like that also the old issue with output is fixed. This goes back to the time of implementing the current log.html data format. There are two problems in my opinion:

  1. Whole output must be read completely when processing it. This means that there is a memory spike. Pabot can somewhat currently solve this for humongous projects by removing keywords for partial results.
  2. Output is broken when writing while file is stopped in the middle Data is not directly usable without fixing it.

I propose that the output.json when done would be Line-delimited JSON. In a format that can be also partially processed even when writing has aborted.

Now when I tested the output.json PR everything is written in the same line and has the same tree like structure as the original output.xml. This means that reading problems are not handled.

In the best case scenario writing log.html from output.json would not have a memory spike. Memory spike means that data processing requires RAM that might not be available. Data would just flow from streaming json that is only partially read and transformed to model objects and from that directly to the format that log.html and report.html use.

Output.xml is also very verbose and replicates same elements. For example keyword documentation is logged every time that a keyword is executed. status elements also eat alot of space as one can check with this gist https://gist.github.com/mkorpela/0a38a0442e601766568eb7823297d44c

mkorpela avatar Dec 11 '20 15:12 mkorpela

Thanks for your comment! Sorry it's been a while since I've posted anything. COVID has made things quite mad (I'm sure it has for everyone). I've posted up in the Slack, and I'm hoping to get this release into RF4.1.

In response to your points:

I agree loading a large project directly into memory isn't ideal. To save me a lot of time rewriting something from scratch I have found this library. I haven't done an implementation with it yet, but I think this would resolve your problem, it would only load something in when needed (if you do the loading correctly). The project mentions loading large JSON files very quickly, so that should be resolve the issue. One minor thing... that paticular project only supports Python3, and does not support Python2.7 (which RF4 will continue support for). With that in mind, I propose that I do an implementation with that library which will allow for two situations:

  1. If Python3 is being used (i.e: That library is installed) then it will use the nicer streaming option
  2. If Python2.7 is being used, and the JSON output is being activated it will simply load the output into memory (unfortunate, but this would go away with RF5 and Python2 support being dropped)

I have (slightly) resolved your problem with the output being broken if execution is stopped midway by using this lovely library. This means that for very large projects this JSON implementation won't consume lots of memory (like the implementation you tested).

Things that are leftover:

  1. I need to finish my implementation for reading from the JSON (see above).
  2. This needs testing in a range of scenarios, I think as a base run it would be good to run the acceptance tests with the JSON output (see how it compared)
  3. I need to finish the little off the trimmings I have added elsewhere. Currently it will recognise if the output file ends in JSON, but the JSON flag itself isn't working how I'd expect it too.
  4. Tests needs implementing in the unit and acceptance tests.

Lemonlemmings avatar Feb 23 '21 18:02 Lemonlemmings

Quick comments:

  1. Having this in RF 4.1 would be nice.
  2. I'm fine this feature being Python 3 only even if the release would still support Python 2.
  3. RF doesn't currently have any mandatory external dependency and I don't want that to change. Some features like YAML variable files only work if you install an external module, and we could use the same approach also with output.json.
  4. Do you already have a JSON structure planned?

pekkaklarck avatar Mar 03 '21 15:03 pekkaklarck