clp icon indicating copy to clipboard operation
clp copied to clipboard

Added schema.

Open SharafMohamed opened this issue 2 years ago • 0 comments

References

Description

  • Previously, general purpose heuristics were applied for determining variables within logs. This change adds an initial implementation to specify a user defined schema for determining variables.
  • If desired, heuristics can still be applied by omitting a schema path when running compression.
  • The schema used during compression is stored within each generated archive and is used during search. If heuristics were used during compression, no schema is stored, and heuristics are used during search.
  • Details specified in components/core/README.md and components/core/README-schema.md

Validation performed

  • Compression and decompression work as expected. All command line options work correctly. No changes to memory usage when using the heuristics or a schema. No changes to speed when using the heuristics. Speed is ~20% slower when using a schema in comparison to using heuristics.
  • Search results remain the same for all queries performed in the paper. Schema search speed is ~25% slower for most cases relative to heuristic search speed (due to increased time in refreshing the dictionaries). Schema search does not disambiguate queries with a wildcard at the start and end (e.g., *text*), so these queries take 500% longer than when heuristics are used.

SharafMohamed avatar Aug 22 '22 22:08 SharafMohamed