fluentd
fluentd copied to clipboard
in_tail: Expand glob capability for square brackets and one character matcher
Which issue(s) this PR fixes: This PR will fix none of the issue but this was raised in community slack channel.
What this PR does / why we need it:
This feature was requested on the community slack's #fluentd channel. For now, we didn't support for square brackets pattern and one character matcher in #expand_paths.
This shouldn't be fit for the following case.
The logs are named as follows:
.
├── 01.log
├── 02.log
├── 03.log
├── 12.log
├── 13.log
├── 20.log
Then, a user wants to tail 02.log, 03.log, 12.log, and 13.log, and excluding to tail 01.log and 20.log with the following configuration:
<source>
@type tail
path "[0-1][2-3].log"
<parse>
@type none
</parse>
tag log.test
read_from_head
use_extended_glob # To use this expansion, users need to enable this feature explicitly.
</source>
However, the current Fluentd doesn't handle the "[0-1][2-3].log" pattern as glob. It just handles as normal file name. This shouldn't be effective for this use case.
It would be nice to handle this pattern on the in_tail plugin.
Docs Changes: https://github.com/fluent/fluentd-docs-gitbook/pull/488
Release Note: Same as title.
From https://fluent-all.slack.com/archives/C0CTT63EE/p1707473140708389
Thanks! It's a very useful enhancement!
I'm concerned about backward compatibility.
If there is already a setting where the path option contains the characters, the behavior will change, correct?
For example, if using the following configuration, is it necessary to change the setting?
<source>
@type tail
path "/path/to/foo[bar]/test.log"
tag test
<parse>
@type none
</parse>
</source>
I want to consider this point carefully.
For example, if using the following configuration, is it necessary to change the setting?
You are right. macOS, and also Windows permit to use square brackets in filename. So, we need to maintain backward compatibility for it.
Once I implemented this feature, I didn't notice the backward compatibility issue and also macOS and Windows permit to use square brackets in their filesystems. So, we need to provide a switch for enabling this feature.
Shouldn't we consider exclude_paths as well?
Excellent! You are brilliant person. Yes, I'd love to handle that.
Wouldn't it be more convenient and clearer to make a new option that just changes the condition to enable
Dir.glob?* For example: the new option `enable_glob`.
I'd prefer to use enum approach. Because switching with true or false shouldn't be representing all of the conditions that we considered. Also, enum approach is more effective to prevent specifying malformed values.
So, my recommendation is:
with_wildcards: This is default setting for that option we'll provide
extended: This should be more concrete name for that option. Because we didn't provide full set of the glob patterns on Dir.glob. In this case, I'd prefer to use "extended" instead of "always".
However, I didn't fully follow your suggestion what "no" means here. Perhaps, you would be interested to collect * contained files with in_tail?
Hey @daipom and @ashie, could you all take a look the applied changes again? This PR is ready to review.
extended: This should be more concrete name for that option. Because we didn't provide full set of the glob patterns on Dir.glob.Is this about not supporting curly braces? In my opinion, as I commented below, it would not be a problem to provide the full set.
Yes. I will not support curly braces here. Because this is intended to extend a capability to handle square brackets in glob patterns only. This should be suitable for the first step to consider how extend the glob capabilities.
We don't need to care about curly braces here. We should make it available to all who want to use it because we can avoid the collision by setting
path_delimiter. We can explain it in the documentation or check it when we do@path.split(@path_delimiter)and give a warning message, if it is needed.In addition, even if
path_delimiteris the default value,, there seems to be no concern about causing the glob to fail because the path split is done first. Do you have concerns about this?
No, I experienced the feature collisions to support curly braces. This is because curly braces are usually used with comma , like as {a,b}. So, it is not able to extend to handle curly braces easily. This should be handled as a future task or put it in ice box for now.
However, I didn't fully follow your suggestion what "no" means here. Perhaps, you would be interested to collect
*contained files with in_tail?I understand your concern. This option is not necessarily needed. If you think you don't need it, let's remove it!
This may rarely actually help. I just thought it would be natural to have an option to disable the feature completely. We can add the option if we need it later.
Sure. Let's remove on that choice from the enable_glob parameter.
@cosmo0920
Is this about not supporting curly braces? In my opinion, as I commented below, it would not be a problem to provide the full set.
Yes. I will not support curly braces here. Because this is intended to extend a capability to handle square brackets in glob patterns only. This should be suitable for the first step to consider how extend the glob capabilities.
I see.
I am concerned that once an option is added, it is difficult to change it later to keep compatibility.
If it's not a problem to provide the full set, I'd like always.
In addition, even if
path_delimiteris the default value,, there seems to be no concern about causing the glob to fail because the path split is done first. Do you have concerns about this?No, I experienced the feature collisions to support curly braces. This is because curly braces are usually used with comma
,like as{a,b}. So, it is not able to extend to handle curly braces easily. This should be handled as a future task or put it in ice box for now.
Hmm, but it can be avoided by changing path_delimiter.
Users who want to use curly braces can use it by changing path_delimiter.
I don't see the benefit of prohibiting it.
Rather, the point is whether it would negatively impact users who don't use curly braces. (users who want to use [] only)
We must avoid it.
And, I'm not concerned about that.
Supporting curly braces seems to have no negative impact on such users.
In summary, from the following three points, it seems to me that there is no need to narrow the conditions now.
- Once an option is added, it is difficult to change it later to keep compatibility.
- Users who want to use curly braces can use it by changing
path_delimiter. - Supporting curly braces seems to have no negative impact on users who want to use
[]only.
What do you think?
- Once an option is added, it is difficult to change it later to keep compatibility.
- Users who want to use curly braces can use it by changing
path_delimiter.- Supporting curly braces seems to have no negative impact on users who want to use
[]only.
- Got it. I'll change the choice name later.
- I don't think so. The most of the functionalities should work well out of the box without any additional settings. config parameters are working as independently as much as possible. Your suggestion is definitely opposite my concerns and recommendations.
- Same as the above. But curly braces shall introduce the functional glitches that I already pointed out.
Additionally, if we were supporting the curly braces in in_tail, users should confuse how to specify path_delimiter character. The current default character is ,. This should introduce the complexity which user must choose the delimiter that won't collide to divide the path for monitoring.
I wish I could add this feature in this PR but adding a capability to handle curly braces introduces such complexity.
If we had a cycle to support user forums and community, it would be nice to add.
However, our supporting resource is limited for OSS version, right? So, I wouldn't prefer to add this feature.
Note: Just to be clear, let me explain why I don't think so in this case.
Including curly braces in the condition itself is not harmful.
- It only affects those who want to use curly braces.
Simpler specifications are better for users.
- It is less confusing for users to say, "You can use all of them, so use whatever you like, but be careful when using curly braces", rather than having users understand that some characters are exceptions to the specification,
I understand the opinions on your side. Yeah, I'm always confusing how should I do/implement this newly implementing feature? Then, I assumed myself as a newbie who is not knowing well how to use/work this software. With this assumption, I often realized that the option/direction which should be confusing easily is not good direction. If I were implementing a feature which is silently dependent another feature, I felt sick to use that feature or simply gave up to use. So, my responded concerns are based on the above assumptions.
Thanks for your suggestions. They are really helpful and well considered. You rock! :+1:
@ashie @kenhys Do you guys have any concerns?
@ashie @kenhys Do you guys have any concerns?
Still reading the discussion.
For the discussion about braces, basically I agree with @cosmo0920. I don't want to introduce implicit dependencies between parameters which may confuse end users.
On the other hand, the enum label :always doesn't seem suitable if we drop brace support.
I tried to consider more suitable name or other configuration way, but I haven't yet find the best one.
How about this one as a a compromise?
- Support brace and keep the label name
:always - Check
:enable_glob,pathandpath_delimiterconfigurations at start up then show error or warning whenpathincludes{}andpath_delimiteris default value,(and:enable_globis:always).
enhancement concept is reasonable, but I felt some concern.
wildcardaccepts*in path, but it also expands other patterns. wildcard may be misreading in some extent even though it is same behavior as before.- It maybe tricky
enable_globaccepts enum instead of boolean. I want to name more appropriate parameter name.glob_policyor something else.
@ashie
How about this one as a a compromise?
* Support brace and keep the label name `:always` * Check `:enable_glob`, `path` and `path_delimiter` configurations at start up then show error or warning when `path` includes `{}` and `path_delimiter` is default value `,` (and `:enable_glob` is `:always`).
I agree.
@kenhys
enhancement concept is reasonable, but I felt some concern.
* `wildcard` accepts `*` in path, but it also expands other patterns. wildcard may be misreading in some extent even though it is same behavior as before.
I understand your concern. This is a difficult point in this enhancement.
This option manages only the condition to enable the glob feature. It does not limit the glob feature itself.
This would be an option name issue.
I proposed enable_glob because I thought this was the condition to enable the blog feature.
If you have a more descriptive name, let's change it.
Got it. glob_policy may better to change from enable_glob. This is because enable_glob implies true or false.
- Check
:enable_glob,pathandpath_delimiterconfigurations at start up then show error or warning whenpathincludes{}andpath_delimiteris default value,(and:enable_globis:always).
I didn't want to add this feature. Because this feature should consume more cycles. I didn't have more cycles to handle it. I'd prefer to operate as a minimum effort.
- Check
:enable_glob,pathandpath_delimiterconfigurations at start up then show error or warning whenpathincludes{}andpath_delimiteris default value,(and:enable_globis:always).I didn't want to add this feature. Because this feature should consume more cycles. I didn't have more cycles to handle it. I'd prefer to operate as a minimum.
Do we need to care about the cycles?
It can be done in configure, not in expand_paths.
https://github.com/fluent/fluentd/blob/eac830f6043e13cfa9fc2410f00363c7b0c66fc1/lib/fluent/plugin/in_tail.rb#L149
Do we need to care about the cycles?
Hmm, I mean cycles as the time to be able to spent for something.
Do we need to care about the cycles? It can be done in
configure, not inexpand_paths.
I found that this approach could be effective for the most cases but I didn't tested them for corner cases yet. Mostly, always option should be OK.
Also, I feel that always option should be a bit of difficult to use. This needs the knowledge of the internal behavior of in_tail and it's a bit of tricky. So, still, I wanted to be continuing to provide the option of extended. This option should be renamed from the previous always option.
wildcardaccepts*in path, but it also expands other patterns. wildcard may be misreading in some extent even though it is same behavior as before.
I'd also considered which name should be better but I didn't find any suitable and relatively shorter word. So, I just use backward_compatible for keeping backward compatiblities.
implementing always option should be done.
I understand the points! The current proposal is to make the following specifications, right?
glob_policy always: We can use all glob features, but@path_delimitermust be changed.glob_policy extended: We can use glob features except curly braces. If using curly braces, it will be ConfigError.glob_policy backward_compatible: Glob features are enabled ifpathincludes*. (The current spec).
Great!! It looks good to me.
I have one concern about the name backward_compatible.
wildcardaccepts*in path, but it also expands other patterns. wildcard may be misreading in some extent even though it is same behavior as before.I'd also considered which name should be better but I didn't find any suitable and relatively shorter word. So, I just use
backward_compatiblefor keeping backward compatiblities.
Wouldn't users be more confused when they see this default value named backward_compatible?
If the priority is to make it easy for users to understand, it would be better to be able to assume the feature from the name. If we think about it, the setting name could be something like this, for example,
glob_policy enable_with_wildcard
What do you think about this? @cosmo0920 @kenhys
glob_policy enable_with_wildcardWhat do you think about this? @cosmo0920 @kenhys
I think that only_wildcard would be better to use but I wanted to emphasize the default value is "backward compatible".
glob_policy enable_with_wildcardWhat do you think about this? @cosmo0920 @kenhys
I think that
only_wildcardwould be better to use
Why? It looks to me to be misleading as @kenhys is concerned:
wildcard accepts * in path, but it also expands other patterns.
but I wanted to emphasize the default value is "backward compatible".
Why? It may not confuse existing users, but wouldn't it confuse new users?
glob_policy enable_with_wildcardWhat do you think about this? @cosmo0920 @kenhys
I think that
only_wildcardwould be better to useWhy? It looks to me to be misleading as @kenhys is concerned:
wildcard accepts * in path, but it also expands other patterns.
As @kenhys said that this was intended to represent for extending regular expressions not only wildcard but for other expressions. But the current circumstances request to this option for only handling wildcard.
but I wanted to emphasize the default value is "backward compatible".
Why? It may not confuse existing users, but wouldn't it confuse new users?
No, it would confuse new users. Because the default value should be representing backward compatible and the name should be noun. Not including verbs. Maybe perfect sentence would be effective but sometimes it makes redundant for the users.
But, only using why? is just like representing that a parent is anger for their kids. It's too direct to represent negative emotions. So, it wouldn't be suitable behavior on the community. Did you follow what I mean?
I see. I would also like to hear @kenhys's opinion.
But, only using
why?is just like representing that a parent is anger for their kids. It's too direct to represent negative emotions. So, it wouldn't be suitable behavior on the community. Did you follow what I mean?
I apologize. I didn't understand the nuance. :cry: Thanks for letting me know. Could you tell me how you normally ask for a reason?
But, only using
why?is just like representing that a parent is anger for their kids. It's too direct to represent negative emotions. So, it wouldn't be suitable behavior on the community. Did you follow what I mean?I apologize. I didn't understand the nuance. 😢 Thanks for letting me know. Could you tell me how you normally ask for a reason?
How come bra bra? or How come? should be better instead of using why? Also, asking reason in text is sometimes difficult to represent in English via direct translation from Japanese. As might you noticed that I often use would/should. Because these auxiliary verb should work for making a room of choices for listeners. Making choices and take a distance should represent politeness in English.
But, only using
why?is just like representing that a parent is anger for their kids. It's too direct to represent negative emotions. So, it wouldn't be suitable behavior on the community. Did you follow what I mean?I apologize. I didn't understand the nuance. 😢 Thanks for letting me know. Could you tell me how you normally ask for a reason?
And no worries. I'm not a native English speaker, neither. But I've been in an "immersion" environment in these years.