scale Investigate logging refactor

Investigate logging refactor

Open dfaller opened this issue 6 years ago • 5 comments

Investigate refactoring the current logging system:

Should work as part of a "whole cluster" logging system
Logs for system components/tasks should be able to be isolated from algorithm tasks.
Needs to allow us to attach specific meta-data fields per Mesos task
Needs to provide a good way for the Scale UI to display portions of the log and support paging in and out.

Jan 02 '18 20:01 dfaller

solution needs to support hot reload of logging levels for scale system components

Jan 09 '18 17:01 Fizz11

It sounds like we are able to use the offset within the file stream to ensure we are able to maintain message ordering with sub-millisecond precision solely using Filebeat. We do need to ensure log rotation at the Mesos sandbox level doesn't cause issues here.

This could be easily tested by hooking filebeat up to a task that sprays incrementing numbers as fast as possible to force a size based file rotation. We then could perform our ordered log retrieval from Elasticsearch to ensure order was maintained.

Jan 18 '18 17:01 gisjedi

We should investigate if filebeat supports injection of arbitrary metadata via key value pairs stored in a sidecar file. If we could just write a file into the sandbox that could provide additional static metadata appended to each filebeat message that would be much preferred over munging the taskids to extract desired metadata.

Jan 18 '18 17:01 gisjedi

Unfortunately there does not appear to be any way to add contextual metadata directly to the filebeat stream beyond the basic values (host, source, etc.). Based on my current understanding, my recommendation is to retain Logstash within our architecture for the purposes of streaming message manipulation. We MUST at least be able to tie the Mesos Task ID to the message, which we can extract from the source field.

Jan 22 '18 17:01 gisjedi

Phase 1:

Document Filebeat pre-reqs for Scale
Update Logstash with highly-available deployment in bootstrap
Update Scale Task IDs to include metadata currently in Syslog tag
Add metadata extracted from Task ID via Logstash to messages sent to Elasticsearch
Upgrade plan for Elasticsearch from 2.x to 6.x
Update UI to support tailing logs directly from Elasticsearch

Phase 2

Deploy Filebeat automatically as part of Scale deployment.

Metadata:

Job Execution
Job Type
System / Algorithm Task
STDERR / STDOUT

Jan 22 '18 20:01 gisjedi

scale scale copied to clipboard

Investigate logging refactor

scale
scale copied to clipboard