haystack icon indicating copy to clipboard operation
haystack copied to clipboard

Implement observability features

Open masci opened this issue 2 years ago • 2 comments

Is your feature request related to a problem? Please describe. Code instrumentation and machine-readable logs are instrumental to make monitoring and alerting easier when Haystack is part of a production environment.

Describe the solution you'd like Logs should have a standard JSON structure and allow filtering out sensitive data

### Tasks
- [ ] https://github.com/deepset-ai/haystack/issues/7026
- [ ] https://github.com/deepset-ai/haystack/issues/7027
- [ ] https://github.com/deepset-ai/haystack/issues/7028

masci avatar Jul 14 '22 09:07 masci

JSON structures are not really common for python frameworks, most of them just use python's logging lib (e.g. tranformers, pytorch, elasticsearch python client). Using structlog seems not be the right tool for a framework like Haystack. Applications using Haystack would be required to use structlog too. Not to mention that this would be yet another dependency to manage. Under the hood, structlog uses extras of standard logging in order to pass structured information. So passing potentially machine-relevant structured data to the extras of logs enables any application to use it. If they do, they could simply configure structlog to use that infos to their needs. If they don't want to use it, so be it: nothing needs to be done.

To get an idea what kind of structured information would be worth logging it, let's take the eval feature (pipeline.execute_eval_run()), (also provoke errors,) dump the logs to MLflow and see if they can be improved:

  • is there some information missing to show?
  • is there some information that should be filtered out, but is currently hard to achieve?
  • is there some information missing to automatically react?

tstadel avatar Jul 25 '22 13:07 tstadel

To answer the questions above:

  • is there some information missing to show?
  • some basic latency/performance measures or node-entering events can improve transparency and
  • is there some information that should be filtered out, but is currently hard to achieve?
  • some information that is only related to engineering and devops does not make sense in some use cases (hosted haystack service). This should be easily filtered out.
  • is there some information missing to automatically react?
  • nothing that popped up

So we can sum it up in two work packages:

  • [ ] Introducing basic latency/performance events for node runs
  • [ ] Add target persona meta information to logs
  • [ ] Guide/documentation about using machine-readable logs

tstadel avatar Aug 09 '22 08:08 tstadel