timecraft icon indicating copy to clipboard operation
timecraft copied to clipboard

Log indexing

Open chriso opened this issue 2 years ago • 1 comments

chriso avatar Jun 29 '23 21:06 chriso

Some notes I gathered relevant to this feature while working on #232

  • Only relevant field to index are (*Record).Time , (*Record).FunctionID and (*Record).FunctionCall I don't see any merits in indexing FunctionCall will someone query logs by the value of the syscall argument ?
  • The actual log data (*Record).FunctionCall is not structured and is in custom encoding between features. Indexing should be context aware ( features should be responsible for indexing their own data)
  • (*Record).Offset is not stored on the segment. It is dynamically set while reading . This limits how much you can skip when querying batches. When you create inverted index that finds batches with relevant logs you will be forced to potentially read the full batch and filter relevant logs in memory.
  • Record is coupled with syscall

Potentially as current api stand maybe indexing timestamps (*Record).Time will make sense and allow commands to accept -start-ts and --end-ts . When reading logs we can skip batches that have no records in the time time range

note : these notes can be incorrect , they come from my limited time hacking on something different.

gernest avatar Oct 10 '23 11:10 gernest