beats icon indicating copy to clipboard operation
beats copied to clipboard

support for Timestamp in file outputter path

Open yspotts opened this issue 1 year ago • 22 comments

Proposed commit message

Add support for timestamp substitution variable in the file outputter path configuration. The format for the timestamp variable is is described in https://www.elastic.co/guide/en/beats/libbeat/8.12/config-file-format-type.html#_format_string_sprintf and is the same as for the elasticsearch index specification: https://www.elastic.co/guide/en/beats/filebeat/master/elasticsearch-output.html#index-option-es.

Full set of supported variables: https://github.com/elastic/beats/blob/e7e6dacfca2055d14fba269253dade2d197906d8/libbeat/common/dtfmt/doc.go#L18-L54

An example configuration:

path: 'fileoutput-%{+yyyy.MM.dd}'

This is accomplished by changing the config type of path to be fmtstr.TimestampFormatString which will Unpack the variable using the Unpack method specific for that type which handles substitution for timestamps. The current time in UTC is used for the actual substitution.

Checklist

  • [x] My code follows the style guidelines of this project
  • [ ] I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • [ ] I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

In one of the fields used by the file outputter, add a time +FORMAT variable to be substituted.

Related issues

  • Closes https://github.com/elastic/beats/issues/37994

Use cases

We will be using Packetbeat for the COGS network metering project. The use case and necessity for this feature is as follows:

If the Packetbeat instance crashes for some reason and restarts, due to the way packetbeat names files, this could likely cause duplicate filenames. The Archive CSI Driver would then upload the new file which would replace the file with the same name in s3, thus causing data loss of COGS metering information.

In order to avoid this, we have identified an easy “fix” to packetbeat: introduce a unique subdirectory path in the output path configuration. This can be accomplished with the current timestamp.

yspotts avatar Feb 14 '24 20:02 yspotts

This pull request does not have a backport label. If this is a bug or security fix, could you label this PR @yspotts? 🙏. For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

mergify[bot] avatar Feb 14 '24 20:02 mergify[bot]

:green_heart: Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2024-03-05T12:47:53.538+0000

  • Duration: 131 min 10 sec

Test stats :test_tube:

Test Results
Failed 0
Passed 29155
Skipped 2046
Total 31201

:green_heart: Flaky test report

Tests succeeded.

:robot: GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

elasticmachine avatar Feb 14 '24 22:02 elasticmachine

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

elasticmachine avatar Feb 15 '24 08:02 elasticmachine

Pinging @elastic/sec-linux-platform (Team:Security-Linux Platform)

elasticmachine avatar Feb 15 '24 08:02 elasticmachine

@belimawr added unit tests and update documentation

yspotts avatar Feb 20 '24 15:02 yspotts

@belimawr thanks for approving! Any ideas about the failing tests? They seem completely unrelated to my changes. Should we try to rerun them?

yspotts avatar Feb 21 '24 15:02 yspotts

@andrewkroh I went in the direction you suggested and allowed the date time to be specified in the path. Let me know what you think. I tested it out locally and it works as designed

yspotts avatar Feb 21 '24 22:02 yspotts

Thanks so much @andrewkroh .

I have updated both. However, it looks like there is a failing test of some sort which seems unrelated to these changes. I tried to track it down, but its beyond my level of knowledge...

yspotts avatar Feb 22 '24 14:02 yspotts

However, it looks like there is a failing test of some sort which seems unrelated to these changes.

I think it may be related. It looks like many tests are failing because they can't read the output file.

self = <test_0032_dns.Test testMethod=test_tcp_axfr>
output_file = 'output/packetbeat-20240222.ndjson', types = None
required_fields = None

    def read_output(self,
                    output_file=None,
                    types=None,
                    required_fields=None):
    
        if output_file is None:
            output_file = "output/packetbeat-"+self.today+".ndjson"
        print(output_file)
    
        jsons = []
>       with open(os.path.join(self.working_dir, output_file), "r", encoding='utf_8') as f:
E       FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\jenkins\\workspace\\PR-38029-8-1b92d8ef-9f9a-4158-993b-3ccde1a49601\\src\\github.com\\elastic\\beats\\packetbeat\\build\\system-tests\\run\\test_0032_dns.Test.test_tcp_axfr244\\output/packetbeat-20240222.ndjson'

tests\system\packetbeat.py:137: FileNotFoundError

andrewkroh avatar Feb 22 '24 16:02 andrewkroh

hmm. It looks like all the test are failing on windows? Or is that just where the tests run? Is there any way I could repro or try it out locally?

And which log/artifact are you looking at to see that snippet? I cant seem to find it anywhere...

yspotts avatar Feb 22 '24 16:02 yspotts

It is interesting to me that the code:

output_file = "output/packetbeat-"+self.today+".ndjson"

Seems to assume a unix path separator. Not sure if that would cause problems in windows, but it seems curious that the failures all seem to be windows

yspotts avatar Feb 22 '24 16:02 yspotts

It seems like the fmtstr code is doing something unexpected with the backslashes.

{
  "@timestamp": "2024-02-22T03:27:42.475Z",
  "ecs.version": "1.6.0",
  "log.level": "info",
  "log.logger": "file",
  "log.origin": {
    "file.line": 104,
    "file.name": "fileout/file.go",
    "function": "github.com/elastic/beats/v7/libbeat/outputs/fileout.(*fileOutput).init"
  },
  "message": "Initialized file output. path=C:UsersjenkinsworkspacePR-38029-8-20a152a8-baa5-4ccc-9622-ac59a245009csrcgithub.comelasticbeatsx-packmetricbeatbuildsystem-testsruntest_airflow.Test.test_server507\\output\\metricbeat max_size_bytes=1024000 max_backups=7 permissions=-rw-------",
  "service.name": "metricbeat"
}
# metricbeat.yml
output.file:
  path: C:\Users\jenkins\workspace\PR-38029-8-20a152a8-baa5-4ccc-9622-ac59a245009c\src\github.com\elastic\beats\x-pack\metricbeat\build\system-tests\run\test_airflow.Test.test_server507/output

From: https://storage.googleapis.com/beats-ci-temp/Beats/beats/PR-38029-8/test-build-artifacts-x-pack-metricbeat-windows-2022-windows-2022-tgz

andrewkroh avatar Feb 22 '24 17:02 andrewkroh

@andrewkroh good find. Looks like the fmtstr code treats backslash as an escape character: https://github.com/elastic/beats/blob/f6343865bf1d7f772443df02fba0370d993e75ec/libbeat/common/fmtstr/formatstring.go#L377

So the path separator in this case gets swallowed. Uggh. Well I see three ways forward I guess:

1] Write our own Unpacker which splits apart the parts of the path and then puts them back together when Run is called 2] Settle for using the fmtstr ONLY on the filename which I think would accomplish our goals since the filenames would be unique. This done have a downside however: the filenames would look quite awkward since the outputter also includes the current date into the filename and that is hard coded in elastic-agent-lib. It seems kinda awkward to provide a way to include some time format in the filename whilst we ALSO include a hard coded date format to the end of the filename 3] Add a new configuration option for the date format of the suffix of the outputted filename (instead of the hard coded one). I am really hesitant about this one since it means having to modify elastic-agent-lib and Im concerned about unintended consequences. Also - I am conerned about how invasive this would need to be to get the config option down to the code that rotates

More I think about it, the more I favor option 1

Can you see any alternatives?

yspotts avatar Feb 22 '24 19:02 yspotts

Here's my idea 🤷 :

https://github.com/yspotts/beats/compare/config-rnd-path...andrewkroh:beats:config-rnd-path?expand=1

andrewkroh avatar Feb 22 '24 20:02 andrewkroh

Here's my idea 🤷 :

https://github.com/yspotts/beats/compare/config-rnd-path...andrewkroh:beats:config-rnd-path?expand=1

Nice! Let me try and adopt that - lets see if all the tests pass

Thanks so much for all your help!!

yspotts avatar Feb 22 '24 20:02 yspotts

Here's my idea 🤷 :

https://github.com/yspotts/beats/compare/config-rnd-path...andrewkroh:beats:config-rnd-path?expand=1

well, that wont work for the following example:

c:\path\%{+yyyy-MM-dd-mm} because it will interpret it as escaping the %. At that point, there is no way to tell if the percent is intended to be escaped or not.

I did a variation of option 3: https://github.com/elastic/beats/pull/38029/files#diff-7b5b4e797d2a07c4776c672a714e5fc98f390ddc4fa855b8d6dfc2f876f473daR1

If that works, and seems ok enough, I can add unit tests there

yspotts avatar Feb 22 '24 21:02 yspotts

:green_heart: Build Succeeded

History

  • :broken_heart: Build #1671 failed 812200a16c866e0105af146201939b57132fe2a9
  • :green_heart: Build #1601 succeeded 27a66b01a6564e9ee6603e492c0be62f7099731a
  • :green_heart: Build #1600 succeeded 44a74cefffe6007f76997c53cbd4a764e9c2b7ec
  • :green_heart: Build #1596 succeeded a33ed078ac7edf4fed38e2ab8bf10570c33a6d85
  • :green_heart: Build #1595 succeeded 78c672ecb617d5fad4c47732c8513d22aef3a84d
  • :broken_heart: Build #1586 failed 16e7d16d77503a39b46c39741b430f0098f787dd

cc @yspotts

elasticmachine avatar Mar 05 '24 13:03 elasticmachine

:broken_heart: Build Failed

Failed CI Steps

History

  • :green_heart: Build #1679 succeeded 812200a16c866e0105af146201939b57132fe2a9
  • :broken_heart: Build #1609 failed 27a66b01a6564e9ee6603e492c0be62f7099731a
  • :green_heart: Build #1604 succeeded a33ed078ac7edf4fed38e2ab8bf10570c33a6d85
  • :broken_heart: Build #1594 failed 16e7d16d77503a39b46c39741b430f0098f787dd
  • :broken_heart: Build #1565 failed 214805f9313306a2e19dfc170b07dfa84dea20a9

cc @yspotts

elasticmachine avatar Mar 05 '24 13:03 elasticmachine

:green_heart: Build Succeeded

History

  • :green_heart: Build #835 succeeded 812200a16c866e0105af146201939b57132fe2a9
  • :green_heart: Build #765 succeeded 27a66b01a6564e9ee6603e492c0be62f7099731a
  • :green_heart: Build #760 succeeded a33ed078ac7edf4fed38e2ab8bf10570c33a6d85
  • :broken_heart: Build #750 failed 16e7d16d77503a39b46c39741b430f0098f787dd
  • :broken_heart: Build #721 failed 214805f9313306a2e19dfc170b07dfa84dea20a9
  • :broken_heart: Build #720 failed 6da08e0c05bef7e8d7ad2db59bb9ae75e0e4934b

cc @yspotts

elasticmachine avatar Mar 05 '24 13:03 elasticmachine

:green_heart: Build Succeeded

History

  • :green_heart: Build #1968 succeeded 812200a16c866e0105af146201939b57132fe2a9
  • :green_heart: Build #1898 succeeded 27a66b01a6564e9ee6603e492c0be62f7099731a
  • :green_heart: Build #1893 succeeded a33ed078ac7edf4fed38e2ab8bf10570c33a6d85
  • :broken_heart: Build #1883 failed 16e7d16d77503a39b46c39741b430f0098f787dd
  • :broken_heart: Build #1854 failed 214805f9313306a2e19dfc170b07dfa84dea20a9
  • :broken_heart: Build #1851 failed 338428bc30d2f42bf1f031bd44a0b23e87d2cfe1

cc @yspotts

elasticmachine avatar Mar 05 '24 13:03 elasticmachine

:green_heart: Build Succeeded

History

  • :broken_heart: Build #2886 failed 812200a16c866e0105af146201939b57132fe2a9
  • :green_heart: Build #2816 succeeded 27a66b01a6564e9ee6603e492c0be62f7099731a
  • :green_heart: Build #2811 succeeded a33ed078ac7edf4fed38e2ab8bf10570c33a6d85
  • :broken_heart: Build #2801 failed 16e7d16d77503a39b46c39741b430f0098f787dd
  • :broken_heart: Build #2771 failed 214805f9313306a2e19dfc170b07dfa84dea20a9

cc @yspotts

elasticmachine avatar Mar 05 '24 13:03 elasticmachine

:green_heart: Build Succeeded

History

  • :green_heart: Build #3307 succeeded 812200a16c866e0105af146201939b57132fe2a9
  • :green_heart: Build #3237 succeeded 27a66b01a6564e9ee6603e492c0be62f7099731a
  • :green_heart: Build #3232 succeeded a33ed078ac7edf4fed38e2ab8bf10570c33a6d85
  • :broken_heart: Build #3222 failed 16e7d16d77503a39b46c39741b430f0098f787dd
  • :broken_heart: Build #3193 failed 214805f9313306a2e19dfc170b07dfa84dea20a9
  • :broken_heart: Build #3192 failed 6da08e0c05bef7e8d7ad2db59bb9ae75e0e4934b

cc @yspotts

elasticmachine avatar Mar 05 '24 13:03 elasticmachine

@yspotts it seems that we are good to go with this one. Could you take care of merging it please?

pierrehilbert avatar Mar 14 '24 14:03 pierrehilbert