stroom icon indicating copy to clipboard operation
stroom copied to clipboard

Add a Pipeline Element Test entity for testing elements in stroom

Open at055612 opened this issue 5 years ago • 2 comments

The idea is to create something a bit like soapUI but for testing stroom pipeline elements and their associated content, i.e for a given type of element and fixed input the test evaluates various user supplied assertions against the output. This means stroom would be capable of running regression tests on new releases. It also means users have a way of testing their content (e.g. XSLTs, splitters, etc.) against known input.

You would create a PipelineElementTest entity in the explorer. You would then need to select the type of element to test, e.g. a Data Splitter Parser, and any props for it, e.g. the DS XML content. You would need to set the fixed input data in a text pane, e.g. raw data or XML.
You would either need to set the expected fixed output in another text pane or add one or more asserts.

As @stroomdev10 says, the input and expected output could to be defined as their own entities to allow reuse in different tests. Something similar to the Dictionary & Script entities, that would define the char encoding, the data type (xml, json, csv, text, etc.) and have an ACE editor with the appropriate syntax highlighting.

An assert could be something like an xpath match, or a simple regex match. The type of asserts available would depend on the element being tested and its output type. e.g. Text assert - simple contains, regex match (with optional match count), matches static complete expected output (e.g like a Dictionary entity), isEmpty, line count, charEncodingMatch XML assert - xpath match (+ all of Text assert) Json assert - json-path (or similar jq style json query language) match (+ all of Text assert)

It may only be possible (or have value) to test certain elements.

There would need to be some mechanism to run each test, e.g. a test suite that collates multiple test entities. Test entities would need a enable/disable toggle. There could be a scheduled job to run all enabled tests. The output of the tests would need to go somewhere, e.g. an error stream.

Thought needed on whether the test would run on a single record basis (e.g. xml fragments) or a collection of records, i.e a small stream. The former raises issues around doing xpaths on xml fragments and whether root elements need to be added to wrap the fragment.

An evolution of this would be a means to test a whole pipeline end-end, but due to the forking in pipelines this would be more complex and may add little value if the elements are tested in isolation.

Another evolution would be to add scripting/templating (e.g jinja2/moustache) to the input and expected output text to allow for running the test with different input variables.

at055612 avatar Jul 06 '20 15:07 at055612

So we'd need a TestDataElement and an AssertElement ? It would be good if these elements referenced proper entities in the DB (Dictionaries?) rather than text elements, then we could have a library of useful stuff, and make them importable/exportable.

I also like the idea of asserts being used for something like a QA step outside of testing

stroomdev10 avatar Jul 07 '20 09:07 stroomdev10

To elaborate further on the templating suggestion, the TestDataElement could contain something like:

user, age
{% for user in users %}
{{ user }},{{ range(1, 99) | random }}
{% endfor %}

We could associate dictionaries with the TestDataElement to provide lists of data, e.g. in this example a list of users, that can be added into the jinja context to be available when the template is rendered. We could also potentially add additional custom jinja filters if the built in ones are insufficient. This would provide an easy way of generating data for testing without having to maintain a static set of test data records. It could be argued that you could achieve the same with XSLT but I think this would be more intuitive.

at055612 avatar Sep 29 '20 08:09 at055612

See stroom-data-generator. Contains elements of above, including templates for generating data.

There are scripts that automatically create and send data into Stroom where they are processed by a pipeline designed to exercise many XSLT features - e.g. all variants of stroom:lookup including bitmaps. The output of this data is indexed within Stroom and there are scripts that then use Stroom's API to search those indexes to obtain the processed data and then compare the result against previously obtained versions (held in the repo itself).

These scripts therefore test a high proportion of the non-interactive functionality of Stroom and could be used to automatically test releases driven by a CD pipeline.

gcdev373 avatar Jan 26 '23 08:01 gcdev373