glom icon indicating copy to clipboard operation
glom copied to clipboard

Traverse glom

Open kurtbrose opened this issue 7 years ago • 19 comments

the job of a Traverse is to walk its target recursively and return an iterator over all of the bits (as in depth-first or breadth-first traversal) -- this could perhaps share some bits with TargetRegistry

this is very useful when combined with Check and Assign for a kind of pattern-matching strategy:

# not sure if Traverse even needs an argument or if it should just implicitly walk current target
# maybe the argument should specify what it iterates over:  just items, items + paths, etc
glom(target, (Traverse(T),  (Check(T.val, validate=lambda t: t<0), Assign('val', 0)))
                                                                                   # ensure T.val >= 0

if there was an un-traverse glom possible, that would be even more powerful; but in the absence of that being able to do something to the items being traversed is still useful

the ultimate goal of this kind of approach is a useful meta-glom -- you can imagine transformations like "set all defaults to a unique marker object that stores the path" to debug why an output is coming as None

the ultimate, ultimate goal being useful glom-macros (glomacro?) and glom-compilation (glompilation?)

kurtbrose avatar Jul 17 '18 05:07 kurtbrose

This idea has evolved a bit -- call it PathEnumerate now, and its job is to dissect a target out into a list of (T, object) pairs.

e.g.

glom([ {'hello': 'world'}, {'goodbye': 'world'}], PathEnumerate())

would result in

[ (T[0], {'hello': 'world'}),
  (T[0]['hello'], 'world'),
  (T[1], {'goodby': 'world'}),
  (T[1]['goodbye'], world)
]

again, the goal is to make glom-specs that mutate glom-specs possible by allowing reasonable specs that operate on an arbitrarily nested structure

kurtbrose avatar Dec 11 '18 07:12 kurtbrose

I needed this feature for my usecase of GDPR. Anywhere I find the key email I gotta remove it -- and it could moving around and hiding!

paths = catalog(target)
# filter paths with regex like '.*email'
for path in paths:
    glom.assign(target, path, None)

roryhr avatar Mar 02 '19 09:03 roryhr

Hey @roryhr! This feature is still coming to glom, but in the meantime you can do what Kurt and I do and use an earlier design, called remap: http://sedimental.org/remap.html#drop-empty-values

It's a bit trickier to use, but it's perfect for cases like yours (similar to the one linked above). Hope this helps!

mahmoud avatar Mar 04 '19 08:03 mahmoud

https://www.w3schools.com/xml/xpath_syntax.asp

traverse should also be able to do an XPath like syntax to filter output (or, if not traverse, something that can be used with traverse very easily)

if the output of traverse is [(path, element)], then the output could be filtered with Match(path) -- however, wildcard is a bit trickier

in XPath, . is "current node" and * is "any number of nodes" -- I'd propose switching these to * and ... for glom, since I think these are more familiar to glom's audience from file system globbing and use of ... in python [] syntax

kurtbrose avatar Nov 11 '19 17:11 kurtbrose

another thing that XPath syntax makes a great deal of is "attributes" vs "path"

here's a good acid test for capability I think: image

one way this could be expressed is

('0.bookstore.book', And(('price', M > 35), 'title'))

a bit more of a mouthful than

/bookstore/book[price>35.00]/title

kurtbrose avatar Nov 11 '19 17:11 kurtbrose

come to think of it... maybe there's something here we want kind of a multi-fetch rather than a pure traverse

what if path supported a '*' syntax which switched it from returning a single result to an iterable of results?

outside of XML land, every node doesn't implicitly have multiple children that you can only refer to by type...

what if this

Path('bookstore.books.*')

was a short-hand for an iterable of results

('bookstore.books.*', [And(('price', M > 35), 'title') | SKIP])

maybe something like that?

kurtbrose avatar Nov 11 '19 17:11 kurtbrose

then, '...' path segment would trigger a recursive walk

Path('a...b')  # return all 'b's at any level from 'a'

one challenge here is that now the path is unknown if e.g. you want to emit that; we could cover that by making S[Path] contain the actual path

then, the "plain" Traverse above would translate to Path('...')

kurtbrose avatar Nov 11 '19 17:11 kurtbrose

I guess "get all paths and values" would be ['...', Fill( (S[Path], T) )]

kurtbrose avatar Nov 11 '19 17:11 kurtbrose

another helper that would be super useful in case of e.g. the GDPR email thing would be Replace() -- assuming we get the invariant on S[Path] right, this would be equivalent to

Assign(Path(S[PARENT][T]) + S[Path], newval) 

or something like that -- on parent of current target, replace current target with new value

I guess the problem with leaning on S[Path] here is that it makes the resulting spec extremely context sensitive

maybe if there was a way to back out instead?

Path('...email..')

this would express, find any paths that go through an attribute named "email", then "back up" one level to the parent

(Path('...email..'), Assign('email', newval)

this would be, go to everywhere with email, then replace with newval

...if we allowed a mechanism for embedding regex...

(Path('...{.*email}..'), Assign(S[Path][-1], newval)

kurtbrose avatar Nov 11 '19 18:11 kurtbrose

so I really like that syntax as a top-level; but probably also want to make sure it decomposes into nice bits and Path doesn't just become super complicated and magical

kurtbrose avatar Nov 11 '19 18:11 kurtbrose

per discussion:

* and ** are probably better than * and ... (avoids colliding with . path demarcation)

some related:

https://github.com/mahmoud/glom/issues/89 -- solved by **

https://github.com/mahmoud/glom/issues/40 -- similar to GDPR use case above

https://github.com/mahmoud/glom/issues/39 -- not sure if this would address that, but there's a similar solution proposed of walk-with-path

kurtbrose avatar Nov 11 '19 21:11 kurtbrose

what would the stand-alone names for * and ** be? Glob() and RGlob() (recursive-glob)?

kurtbrose avatar Nov 11 '19 21:11 kurtbrose

Traverse() and Reverse() (recursive traverse)

kurtbrose avatar Nov 12 '19 00:11 kurtbrose

maybe Tread() and Retread()?

kurtbrose avatar Nov 14 '19 22:11 kurtbrose

Iter() and DeepIter()?

kurtbrose avatar Nov 14 '19 22:11 kurtbrose

Every() and REvery()? All() and RAll()? Each() and Reach()?

kurtbrose avatar Nov 27 '19 19:11 kurtbrose

I kind of like Each() and Reach()

kurtbrose avatar Nov 27 '19 21:11 kurtbrose

https://github.com/mahmoud/glom/pull/144

kurtbrose avatar Jun 02 '20 18:06 kurtbrose

Hello. Is the plan still to implement traverse at some point? Any helpt required for this?

GlenDC avatar Jan 13 '21 23:01 GlenDC