archive-query-log
archive-query-log copied to clipboard
Elasticsearch storage backend
Restucture the crawing and parsing to store structured data in Elasticsearch indices instead of in the file system. Also store WARCs in S3 instead of raw files. The new storage backends should be flexible enough to allow for re-parsing parts of the dataset without having to delete anything. The second key requirement is to be able to scale up massively by only interacting with standard ES/S3 APIs instead of having to mount a shared file system on all nodes.
Codecov Report
Attention: Patch coverage is 51.77305% with 68 lines in your changes missing coverage. Please review.
Project coverage is 56.36%. Comparing base (
668de7e) to head (fbd3c6f). Report is 46 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #25 +/- ##
===========================================
- Coverage 89.68% 56.36% -33.32%
===========================================
Files 61 16 -45
Lines 2724 864 -1860
===========================================
- Hits 2443 487 -1956
- Misses 281 377 +96
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
🚀 New features to boost your workflow:
- ❄ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Closes #9
Fixes #6