clp icon indicating copy to clipboard operation
clp copied to clipboard

Add dynamic programming approach for schema search.

Open SharafMohamed opened this issue 8 months ago • 0 comments

Description

  1. Use dynamic programming approach to efficiently search archives compressed using Log Surgeon's schema.
  • Dynamic programming approach determines all search query substrings that can be variables and uses these variables to build all possible logtypes that can match the search query. Logtype + variable combinations form subqueries used to search the archive.
  • Add class for storing each subquery.
  • Move heuristic only logic into heuristic case.
  1. Handle archive writing case if in the future we ever decide to swap from a timestamped to non-timestamped log event.
  2. StringReader class fixes.
  • Correctly reset m_pos to 0 when closing.
  • Fix naming and initialization of member variables.
  1. Remove unused code.

Validation performed

  1. CLP test: tests logs generate decoded archives that match the expected ground truth.
  2. CLT test: Correctly compresses and searches Hadoop 258GB dataset using CLP paper queries.

Summary by CodeRabbit

  • New Features

    • Introduced enhanced query interpretation and wildcard expression handling capabilities.
    • Added functionality for processing and validating wildcard expressions in queries.
    • Implemented a method to check if variable sequences match subqueries.
  • Bug Fixes

    • Improved handling of lexers and query processing logic to streamline operations and reduce complexity.
  • Tests

    • Expanded test coverage for the Grep class, including validation of wildcard expressions and interpretations.
  • Documentation

    • Updated documentation for new classes and methods related to query interpretation and wildcard expressions.

SharafMohamed avatar Jun 17 '24 15:06 SharafMohamed