flow-development-collection icon indicating copy to clipboard operation
flow-development-collection copied to clipboard

FEATURE: Arrays::getValueByPath with escaped dot

Open gerdemann opened this issue 5 years ago • 37 comments

It is now possible to escape the dot in the path of Arrays::getValueByPath. With this you get array keys, which have a dot.

Example:

$foo = ['bar.baz' => 'value'];
$path = 'bar\.baz';

What I did It is now possible to escape the dot in the path of Arrays::getValueByPath.

How I did it I have adapted the method Arrays::getValueByPath

How to verify it I wrote a test for it.

Checklist

  • [x] Code follows the PSR-2 coding style
  • [x] Tests have been created, run and adjusted as needed
  • [x] The PR is created against the lowest maintained branch

gerdemann avatar Jan 31 '20 07:01 gerdemann

Technically that's a feature, so should go into next release. But then again, escaping a dot might be an expected enough thing that we could relabel it as "BUGFIX"...

Anyway, one thing I'm unsure about: What if the \ is an expected part of the path, i.e.

$foo = ['bar\\' => ['baz' => 'value']];

I know, looks weird, but assume the key comes from a path traversal and stores directory names on a windows machine... (whole lot of edgy cases). I think in that case this could be breaky, but more importantly, it would need an "escape escape" hatch, which this doesn't cover yet.

albe avatar Jan 31 '20 08:01 albe

@albe I have adapted the regular expression so that your example can be escaped. This makes it a breaking change, I think. I am not a regex expert, if someone has a better solution I would be happy.

gerdemann avatar Jan 31 '20 09:01 gerdemann

Why would it be expected? I don't even know why it should be possible in the first place?

This is an application specific syntax expression (aka path syntax) why does it need escaping support?

kitsunet avatar Jan 31 '20 09:01 kitsunet

With this change it is possible to use the command 'configuration:show' also for e.g. NodeTypes.

./flow configuration:show --type NodeTypes --path Vendor\\Site:My\\NodeType

gerdemann avatar Jan 31 '20 09:01 gerdemann

./flow configuration:show --type NodeTypes --path "Vendor.Site:My.NodeType"

should do fine?

kitsunet avatar Jan 31 '20 09:01 kitsunet

Note: No it doesn't, got it now!

kitsunet avatar Jan 31 '20 09:01 kitsunet

It would also be nice to do something like this:

./flow configuration:show --type NodeTypes --path Vendor\\.Site:My\\.NodeType.superTypes.Vendor\\.Site:My\\.OtherNodeType

gerdemann avatar Jan 31 '20 09:01 gerdemann

Shouldn't we then rather think about a nice syntax extension for the scope?

Eg. [Vendor.Site:My.NodeType].properties

Obviously we need to select the separate well! The brackets look nice but might collide with something ofc.

kitsunet avatar Jan 31 '20 09:01 kitsunet

@albe I have adapted the regular expression so that your example can be escaped. This makes it a breaking change, I think. I am not a regex expert, if someone has a better solution I would be happy.

Was just fiddling around with the regex myself and was now stuck at '#(?<!\\\\)((?:\\\\\\\\)*)\\.#', which works for any uneven amount of escape characters but needs to use PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE to keep the even number of backslashes in the split. You'd then need to array_reduce array elements that only consist of backslashes into the previous. Ugh... getting wild quickly. Your solution only works for the one special case, but I guess that's good enough?

Edit: For reference, here's my fiddle with the ugly-ass "generic" solution: https://3v4l.org/pFoER

albe avatar Jan 31 '20 10:01 albe

As alternative here the version with the brackets: https://github.com/neos/flow-development-collection/compare/master...gerdemann:getvaluebypath-with-escaped-dot-2

gerdemann avatar Jan 31 '20 10:01 gerdemann

The biggest gripe I have with both variants right now is that it turns a super simple and fast function that is used all over the place into something substantially more complex. And while the increase in processing time might be minimal the amount it gets used will probably make it tangible.

kitsunet avatar Jan 31 '20 12:01 kitsunet

The path given to utility can already be an array (probably should be split into two functions but separate problem) but I guess I woudl rather like to see the command doing the work and we accept a new argument "pathSegments" or something that is then an array of keys

kitsunet avatar Jan 31 '20 12:01 kitsunet

Yes, performance is an argument. I would have, under certain circumstances, the functionality not only in the command. What do you think about a second method "getValueByPathString" or something in the Arrays utility?

gerdemann avatar Jan 31 '20 12:01 gerdemann

The biggest gripe I have with both variants right now is that it turns a super simple and fast function that is used all over the place into something substantially more complex.

Yep, that's also why I even slightly prefer the "imperfect" solution without the array_reduce.

Guess that's something we can't easily debate or refactor away. Moving the splitting outside the function will strip it of it's core functionality, plus you have to handle all those places where the function is used. I agree though, that the method should be split into two - one that takes a pathSegments array and one that takes a path string and calls the former.

So it comes down to a general decision/trade-off of "allowing dots in array keys" vs. code complexity+performance

albe avatar Jan 31 '20 12:01 albe

IDK, for me the nodetypes are a bit special, do we have any other place where we need this functionality? it wa sbuild for simple paths like settings and that's what it's great for.

kitsunet avatar Jan 31 '20 12:01 kitsunet

Then it shouldn't be in a general arrays utility, should it? I use it often with other arrays as well.

gerdemann avatar Jan 31 '20 12:01 gerdemann

I mean, it fullfills a specific task and that it does fast, but yeah we should certainly talk about that.

kitsunet avatar Jan 31 '20 12:01 kitsunet

So, how do we proceed? @kitsunet

albe avatar Feb 27 '20 15:02 albe

Not quite sure what happened to this PR, but it somehow got rebased (?) and a whole lot of other commits pulled in :/ Would like to go over this again tomorrow and see if we can make it mergeable then @kitsunet or rather postpone to 6.3

albe avatar Apr 01 '20 23:04 albe

@albe I only see one commit, is there anything I can do to help?

gerdemann avatar Apr 02 '20 08:04 gerdemann

D'oh - has this resolved itself? :D Thanks!

This bot is running rouge...

albe avatar Apr 02 '20 15:04 albe

Sorry this couldn't land in 6.2 yet - let's get this into major soon so it lands in 6.3!

albe avatar Apr 28 '20 10:04 albe

Everyone - if we get this sorted out before monday… it could maybe land in 6.3. I skimmed the PR comments, but would love to get a summary of the status from @albe or @kitsunet, as you seem to have thought about this quite a lot. Thanks!

kdambekalns avatar Aug 14 '20 13:08 kdambekalns

Sorry I couldn't look into this again yet. I think we still have a decision to make here.

Here's the latest summary of discussion:

The biggest gripe I have with both variants right now is that it turns a super simple and fast function that is used all over the place into something substantially more complex.

Yep, that's also why I even slightly prefer the "imperfect" solution without the array_reduce.

Guess that's something we can't easily debate or refactor away. Moving the splitting outside the function will strip it of it's core functionality, plus you have to handle all those places where the function is used. I agree though, that the method should be split into two - one that takes a pathSegments array and one that takes a path string and calls the former.

So it comes down to a general decision/trade-off of "allowing dots in array keys" vs. code complexity+performance

IDK, for me the nodetypes are a bit special, do we have any other place where we need this functionality? it wa sbuild for simple paths like settings and that's what it's great for.

Then it shouldn't be in a general arrays utility, should it? I use it often with other arrays as well.

So, tl;dr: Should we split this into two methods, in order to keep the simplicity and performance of the original and if so, where to put the new method/move the old?

albe avatar Sep 13 '20 10:09 albe

I was once also trying to access an array with dots in paths via this method and i assumed this would work: $path = 'something."this.is.one.thing".more.stuff' but it didnt work (because explode...) Escaping the dots with backslash is something i dindt think of...

btw i think this regex might look better: (?:\\.)(*SKIP)(*FAIL)|\. (the magic is that it selects all Backslashes and its following char and excludes these matches from being used, for what was not matched it checks for simple dots.) The above regex can also be modified to select all dots except in quotes as shown here: https://github.com/neos/neos-development-collection/blob/be26b253c8bce69695047fa7799e6c2d4f0cf02f/Neos.Fusion/Classes/Core/Parser.php#L102

i needed this feature for a cli when i wanted to get an ast path of fusion for debugging ;) like would be cool if this would work: "root.'Neos.Fusion:Case'.stuff"

mhsdesign avatar Nov 25 '21 20:11 mhsdesign

btw i think this regex might look better: (?:\\.)(*SKIP)(*FAIL)|\. (the magic is that it selects all Backslashes and its following char and excludes these matches from being used, for what was not matched it checks for simple dots.)

🤯 sorry I only stumbled upon this again now, but this regex seems to be a nice solution: https://3v4l.org/cIWOA#v7.4.28

albe avatar Feb 26 '22 21:02 albe

How about to use stripslashes to unescape the matches? https://3v4l.org/E7mRP#v7.4.28

btw i dont know if this backslash escape syntax is intuitive.

id like this string quoting syntax better: foo.bar."escaped.dots.likeinfusionpaths".baz

mhsdesign avatar Feb 27 '22 08:02 mhsdesign

How about to use stripslashes to unescape the matches? https://3v4l.org/E7mRP#v7.4.28

👍

id like this string quoting syntax better: foo.bar."escaped.dots.likeinfusionpaths".baz

The drawback with that would be that you'd need to depend on the escape character of the surrounding syntax if this path is within quotes, e.g. path="foo.bar.\"escaped.dots.likeinfusionpaths\".baz"

I think backslash as escape character is well known in PHP, so I do think it's a valid thing. I'm open to good arguments for other escape methods.

albe avatar Feb 27 '22 19:02 albe

I still feel this might result in a possibly drastic performance change, just due to the sheer amount of usage this method gets and it now doing a lot of useless things for an edge case. I would rather implement an alternative way for specific use cases even if that means NodeTypes won't work everywhere/with everything.

kitsunet avatar May 30 '22 08:05 kitsunet

I still feel this might result in a possibly drastic performance change

So a benchmark for the added strpos call would clear this up?

kdambekalns avatar May 30 '22 08:05 kdambekalns