Data Splitter Improvements required
• Restructuring the code-base of the ace function library, and the data-splitter so their jar files could be used in a standalone environment (e.g. a shell command-line with the Saxon jar, or an XSLT debugger like OxygenXML).
• Repackaging the data-splitter so it may be used outside of Stroom (eg. the command-line).
• Allow splitters and transforms to be written in extension scripting languages (e.g. jython, groovy, etc).
• Allow plugins to define additional stroom:functions, and splitters, and transforms.
• Adding a new ace function, say stroom:end-of-current-stream(), that an XSLT transform could call to determine whether the current input raw event is the last one in the current stream file’s segment. Addition of this function could improve the robustness of a transform attempting to assemble and process a multi-line raw events by giving it an opportunity to process and flush the current raw event. The currently available method of assembling multi-line events (ed auditd logs) in the datasplitter is unreliable and inflexible given the data splitter’s limitations.
• Adding a robust CSV parsing function to the data splitter or XSLT extension function library that handles embedded quotes etc properly.
eg. Aaa,”BBBB”,”CCC””embedded-quote””CC”, DDD)
• Adding a mechanism to allow a data-splitter or XSLT transform to split or divert events to other pipelines or pipeline branches. This addition would allow efficient handling of mixed or uncategorized events. (? would allowing the data-splitter and transform to be able to dynamically change the output-event-type provide this functionality ?)
• Enhance the data splitter to allow definition of named regex, groups, data sequences, etc so they may be reused elsewhere to reduce duplication.
• Enhance the data splitter to allow use of include files.