flow-development-collection
flow-development-collection copied to clipboard
FEATURE: Arrays::getValueByPath with escaped dot
It is now possible to escape the dot in the path of Arrays::getValueByPath. With this you get array keys, which have a dot.
Example:
$foo = ['bar.baz' => 'value'];
$path = 'bar\.baz';
What I did It is now possible to escape the dot in the path of Arrays::getValueByPath.
How I did it I have adapted the method Arrays::getValueByPath
How to verify it I wrote a test for it.
Checklist
- [x] Code follows the PSR-2 coding style
- [x] Tests have been created, run and adjusted as needed
- [x] The PR is created against the lowest maintained branch
Technically that's a feature, so should go into next release. But then again, escaping a dot might be an expected enough thing that we could relabel it as "BUGFIX"...
Anyway, one thing I'm unsure about:
What if the \
is an expected part of the path, i.e.
$foo = ['bar\\' => ['baz' => 'value']];
I know, looks weird, but assume the key comes from a path traversal and stores directory names on a windows machine... (whole lot of edgy cases). I think in that case this could be breaky, but more importantly, it would need an "escape escape" hatch, which this doesn't cover yet.
@albe I have adapted the regular expression so that your example can be escaped. This makes it a breaking change, I think. I am not a regex expert, if someone has a better solution I would be happy.
Why would it be expected? I don't even know why it should be possible in the first place?
This is an application specific syntax expression (aka path syntax) why does it need escaping support?
With this change it is possible to use the command 'configuration:show' also for e.g. NodeTypes.
./flow configuration:show --type NodeTypes --path Vendor\\Site:My\\NodeType
./flow configuration:show --type NodeTypes --path "Vendor.Site:My.NodeType"
should do fine?
Note: No it doesn't, got it now!
It would also be nice to do something like this:
./flow configuration:show --type NodeTypes --path Vendor\\.Site:My\\.NodeType.superTypes.Vendor\\.Site:My\\.OtherNodeType
Shouldn't we then rather think about a nice syntax extension for the scope?
Eg. [Vendor.Site:My.NodeType].properties
Obviously we need to select the separate well! The brackets look nice but might collide with something ofc.
@albe I have adapted the regular expression so that your example can be escaped. This makes it a breaking change, I think. I am not a regex expert, if someone has a better solution I would be happy.
Was just fiddling around with the regex myself and was now stuck at '#(?<!\\\\)((?:\\\\\\\\)*)\\.#'
, which works for any uneven amount of escape characters but needs to use PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE
to keep the even number of backslashes in the split. You'd then need to array_reduce
array elements that only consist of backslashes into the previous. Ugh... getting wild quickly. Your solution only works for the one special case, but I guess that's good enough?
Edit: For reference, here's my fiddle with the ugly-ass "generic" solution: https://3v4l.org/pFoER
As alternative here the version with the brackets: https://github.com/neos/flow-development-collection/compare/master...gerdemann:getvaluebypath-with-escaped-dot-2
The biggest gripe I have with both variants right now is that it turns a super simple and fast function that is used all over the place into something substantially more complex. And while the increase in processing time might be minimal the amount it gets used will probably make it tangible.
The path given to utility can already be an array (probably should be split into two functions but separate problem) but I guess I woudl rather like to see the command doing the work and we accept a new argument "pathSegments" or something that is then an array of keys
Yes, performance is an argument. I would have, under certain circumstances, the functionality not only in the command. What do you think about a second method "getValueByPathString" or something in the Arrays utility?
The biggest gripe I have with both variants right now is that it turns a super simple and fast function that is used all over the place into something substantially more complex.
Yep, that's also why I even slightly prefer the "imperfect" solution without the array_reduce.
Guess that's something we can't easily debate or refactor away. Moving the splitting outside the function will strip it of it's core functionality, plus you have to handle all those places where the function is used. I agree though, that the method should be split into two - one that takes a pathSegments
array and one that takes a path string and calls the former.
So it comes down to a general decision/trade-off of "allowing dots in array keys" vs. code complexity+performance
IDK, for me the nodetypes are a bit special, do we have any other place where we need this functionality? it wa sbuild for simple paths like settings and that's what it's great for.
Then it shouldn't be in a general arrays utility, should it? I use it often with other arrays as well.
I mean, it fullfills a specific task and that it does fast, but yeah we should certainly talk about that.
So, how do we proceed? @kitsunet
Not quite sure what happened to this PR, but it somehow got rebased (?) and a whole lot of other commits pulled in :/ Would like to go over this again tomorrow and see if we can make it mergeable then @kitsunet or rather postpone to 6.3
@albe I only see one commit, is there anything I can do to help?
D'oh - has this resolved itself? :D Thanks!
This bot is running rouge...
Sorry this couldn't land in 6.2 yet - let's get this into major soon so it lands in 6.3!
Everyone - if we get this sorted out before monday… it could maybe land in 6.3. I skimmed the PR comments, but would love to get a summary of the status from @albe or @kitsunet, as you seem to have thought about this quite a lot. Thanks!
Sorry I couldn't look into this again yet. I think we still have a decision to make here.
Here's the latest summary of discussion:
The biggest gripe I have with both variants right now is that it turns a super simple and fast function that is used all over the place into something substantially more complex.
Yep, that's also why I even slightly prefer the "imperfect" solution without the array_reduce.
Guess that's something we can't easily debate or refactor away. Moving the splitting outside the function will strip it of it's core functionality, plus you have to handle all those places where the function is used. I agree though, that the method should be split into two - one that takes a pathSegments array and one that takes a path string and calls the former.
So it comes down to a general decision/trade-off of "allowing dots in array keys" vs. code complexity+performance
IDK, for me the nodetypes are a bit special, do we have any other place where we need this functionality? it wa sbuild for simple paths like settings and that's what it's great for.
Then it shouldn't be in a general arrays utility, should it? I use it often with other arrays as well.
So, tl;dr: Should we split this into two methods, in order to keep the simplicity and performance of the original and if so, where to put the new method/move the old?
I was once also trying to access an array with dots in paths via this method and i assumed this would work:
$path = 'something."this.is.one.thing".more.stuff'
but it didnt work (because explode...)
Escaping the dots with backslash is something i dindt think of...
btw i think this regex might look better: (?:\\.)(*SKIP)(*FAIL)|\.
(the magic is that it selects all Backslashes and its following char and excludes these matches from being used, for what was not matched it checks for simple dots.)
The above regex can also be modified to select all dots except in quotes as shown here: https://github.com/neos/neos-development-collection/blob/be26b253c8bce69695047fa7799e6c2d4f0cf02f/Neos.Fusion/Classes/Core/Parser.php#L102
i needed this feature for a cli when i wanted to get an ast path of fusion for debugging ;) like would be cool if this would work: "root.'Neos.Fusion:Case'.stuff"
btw i think this regex might look better:
(?:\\.)(*SKIP)(*FAIL)|\.
(the magic is that it selects all Backslashes and its following char and excludes these matches from being used, for what was not matched it checks for simple dots.)
🤯 sorry I only stumbled upon this again now, but this regex seems to be a nice solution: https://3v4l.org/cIWOA#v7.4.28
How about to use stripslashes
to unescape the matches? https://3v4l.org/E7mRP#v7.4.28
btw i dont know if this backslash escape syntax is intuitive.
id like this string quoting syntax better: foo.bar."escaped.dots.likeinfusionpaths".baz
How about to use stripslashes to unescape the matches? https://3v4l.org/E7mRP#v7.4.28
👍
id like this string quoting syntax better:
foo.bar."escaped.dots.likeinfusionpaths".baz
The drawback with that would be that you'd need to depend on the escape character of the surrounding syntax if this path is within quotes, e.g. path="foo.bar.\"escaped.dots.likeinfusionpaths\".baz"
I think backslash as escape character is well known in PHP, so I do think it's a valid thing. I'm open to good arguments for other escape methods.
I still feel this might result in a possibly drastic performance change, just due to the sheer amount of usage this method gets and it now doing a lot of useless things for an edge case. I would rather implement an alternative way for specific use cases even if that means NodeTypes won't work everywhere/with everything.
I still feel this might result in a possibly drastic performance change
So a benchmark for the added strpos
call would clear this up?