WMCore icon indicating copy to clipboard operation
WMCore copied to clipboard

Improvements for Kibana's reporting of job logs

Open KatyEllis opened this issue 4 years ago • 5 comments

Impact of the new feature Data being collected from jobs to input to Kibana (is this by ES?)

I understand that only 1% of logs from successful Production jobs are kept in /eos/cms/store/logs/prod/recent/PRODUCTION. I realise we do not have space to keep more than this. However, it would be good to make it easy to find these logs, which are mixed in with the failed jobs (all of which are kept). IIRC only certain job types are stored, even for failing jobs (e.g. Processing and Production).

Being able to find at least some successful jobs would be helpful when I am studying low job efficiencies, for example. In Kibana I can filter for my site, job type, job success, and observe which ones have low CPU efficiency, but I am unlikely to be lucky enough to select one for which a job log has been stored.

Describe the solution you'd like

  1. I would like a flag in Kibana for jobs which are being logged. A value of 1 would indicate that the job has a log stored in /eos/. A value of 0 would mean the log is not stored. Then I would filter on jobs with stored logs.
  2. The location of the job as a path or link in Kibana would also be helpful and make the task more efficient.

Describe alternatives you've considered Manual search, but this is not very productive!

Additional context Add any other context or screenshots about the feature request here.

KatyEllis avatar Jun 08 '20 13:06 KatyEllis

Thanks Katy for creating this issue.

Perhaps another approach here would be to actually condor chirp the filename uploaded to CERN EOS; or set it to 0/None/empty in case a file hasn't actually been uploaded.

@khurtado Kenyi, IIRC, there is a limit in the amount of chirping we can do, right? Otherwise, we could perhaps provide the whole path from within the runtime.

amaltaro avatar Jun 12 '20 12:06 amaltaro

@amaltaro Yes, the current limit is 5120 bytes

khurtado avatar Jun 12 '20 20:06 khurtado

I think proper solution requires the following steps:

  • define schema for Kibana records
  • identify place in WMCore which collect log info and enhance it with new JSON document which satisfies schema above
  • inject data to CMS MONIT similar to WMArchive such that data will be injected to ES/OpenSearch/Kibana.
    • this can be a separate process by the end of any job which may parse necessary job logs and perform data injection to MONIT

This approach has several advantages:

  1. The data will be structured and therefore easy to search since we will know schema up-front
  2. The data will be automatically feed to ES and HDFS to allow different data retention policies
  3. The data will contain pointer to actual log file (if we'll continue keeping logs on /eos).
  4. The data will be easily searchable either via ES/OpenSearch QL or Spark on HDFS
  5. The pre-defined schema will contain all necessary fields end-user want since it will be driven by end-user use-case(s)
  6. The successful and failed job logs (data) can be fed into different ES/OpenSearch streams in MONIT and therefore their index and data volume will be different and may have different data retention policies

vkuznet avatar Apr 17 '23 17:04 vkuznet

not sure guys were you are on this, I can just comment that this is another "wanted feature" by our users

leggerf avatar Dec 05 '23 09:12 leggerf

Yes, this would still be useful. I have more or less given up looking at Production job logs because they are difficult to find. (CRAB job logs are linked from the Opensearch field, and this would be the ideal way to access Production logs too).

KatyEllis avatar Dec 05 '23 13:12 KatyEllis