clp
clp copied to clipboard
Add dynamic programming approach for schema search.
Description
- Use dynamic programming approach to efficiently search archives compressed using Log Surgeon's schema.
- Dynamic programming approach determines all search query substrings that can be variables and uses these variables to build all possible logtypes that can match the search query. Logtype + variable combinations form subqueries used to search the archive.
- Add class for storing each subquery.
- Move heuristic only logic into heuristic case.
- Handle archive writing case if in the future we ever decide to swap from a timestamped to non-timestamped log event.
- StringReader class fixes.
- Correctly reset m_pos to 0 when closing.
- Fix naming and initialization of member variables.
- Remove unused code.
Validation performed
- CLP test: tests logs generate decoded archives that match the expected ground truth.
- CLT test: Correctly compresses and searches Hadoop 258GB dataset using CLP paper queries.
Summary by CodeRabbit
-
New Features
- Introduced enhanced query interpretation and wildcard expression handling capabilities.
- Added functionality for processing and validating wildcard expressions in queries.
- Implemented a method to check if variable sequences match subqueries.
-
Bug Fixes
- Improved handling of lexers and query processing logic to streamline operations and reduce complexity.
-
Tests
- Expanded test coverage for the
Grep
class, including validation of wildcard expressions and interpretations.
- Expanded test coverage for the
-
Documentation
- Updated documentation for new classes and methods related to query interpretation and wildcard expressions.