zarr-python
zarr-python copied to clipboard
find, findall
Maybe useful to have methods "find" and "findall" on Group class to search for members (descendants?) matching some name query.
Should be pretty doable with a visitor pattern.
I would like to work on this issue. Can I get more information on this and some guidance on how to get started.
Hi @ShristiJain. I think based on the age that this is going to be a case of there being a solution in place. For example, see the visit functions here: https://zarr.readthedocs.io/en/stable/api/hierarchy.html#zarr.hierarchy.Group.array_keys
Probably the best way forward is to try to write a very small example here on the issue that shows how one would print only groups matching a given name as @alimanfoo suggested.
From that point, we either add a method with your code, or if it's simple enough, maybe we just include your example directly in the documentation. For example in the tutorial: https://github.com/zarr-developers/zarr-python/blob/master/docs/tutorial.rst
Thank you @joshmoore , I understand your point now. Actually I am an Outreachy Applicant from India and I was looking for some issues that I can work on and record on the Outreachy site. I am very eager to be a part of this community. I would request you if you could suggest any issue on which I can work. I am proficient in JAVA and Python and know android studio too, I am open to learning new things and technologies and am wiling to contribute to the project titled "Build registry of Zarr codecs". Thank you in advance!
FWIW here are the different visit
methods, which could be used to implement this. Would try creating a simple dataset locally and then see if you can write code using one of those methods to do this
Hi I'm Weddy, an outreachy applicant. I would like to work on this issue.
Hi Weddy, that would be great! 😄
Would recommend trying to write this in terms of the visit
methods and check for keys that match (possibly using re
).
Got it! Thank you.
Hi @joshmoore @MSanKeys963 @jakirkham I just wanted to give you an update. Apologies that i did not tag joshmoore and MSanKeys963 while requesting to be assigned the issue earlier. I am currently trying to incorporate both re.search and re.findall together in the group class that I've created. As of now my regex is printing none which I am trying to resolve. Thank you
@caviere, no worries. Have assigned you 🙂
Would encourage sending a PR with what you have (even if it is incomplete and has bugs). It's possible we can spot things and share tips that would save you some time 😉
@jakirkham thank you. I've been able to make a small progress. The code is now printing the name of the groups. If it is okay with you, I'll send it on or before tuesday for your review, guidance and approval.
Yep sounds great! Thank you 🙏
@jakirkham @joshmoore after going over it several times, I think the best option is for users to use visit
rather than add methods for find
and findall
since visit already provides enough to quickly implement these two, I can't find a single situation where find and findall might provide additional benefit.
Consider the following:
root = zarr.group()
foo = root.create_group("foo")
bar = root.create_group("bar")
root.create_group("aaa").create_group("bbb").create_group("ccc").create_group("aaa")
If we print tree we get:
/
├── aaa
│ └── bbb
│ └── ccc
│ └── aaa
├── bar
└── foo
To implement find
, that is the first path that has pattern aaa
, we do the following. Since we only want the first match, we return a non-None value to stop further iteration.
pattern = re.compile("aaa")
found = None
def find(path):
nonlocal found
if pattern.search(path) is not None:
found = path
return True
root.visit(find)
print(found)
# ['aaa']
To implement findall
, we gather all the results into a list:
pattern = re.compile("aaa")
found = []
def findall(path):
if pattern.search(path) is not None:
found.append(path)
root.visit(findall)
print(found)
# ['aaa', 'aaa/bbb', 'aaa/bbb/ccc', 'aaa/bbb/ccc/aaa']
In some cases, we only want to search on the last part of the path, in this case, we can filter out the prefix using a greedy regex:
prefix_pattern = re.compile(r".*/")
pattern = re.compile("aaa")
found = [
def findall(path):
match = prefix_pattern.match(path)
if match is None:
name = path
else:
_, end = match.span()
name = path[end:]
if pattern.search(name) is not None:
found.append(path)
return None
root.visit(findall)
print(found)
# ['aaa', 'aaa/bbb/ccc/aaa']
@jakirkham since I'd like to make a contribution, is it possible for you to consider a couple of linting errors I found and fixed while I was working on this, here's my pull request https://github.com/zarr-developers/zarr-python/pull/1226
Also my apologies for sending my PR two days late.
No problems at all.
Consider the following:
This is a great write up. Thank you.
after going over it several times, I think the best option is for users to use visit rather than add methods for find and findall
Motion to close then? (auto-destruct initiated...)
Thanks for the writeup! We could always include these as examples in the docs. Perhaps they provide a good starting place for users wanting similar functionality or just generally wanting to learn about visit*
methods
Good idea!
yes, it is okay if it's closed
@joshmoore I have started working on the documentation here https://github.com/zarr-developers/zarr-python/pull/1241 , there are a couple of errors I am trying to fix:
- on running
make html
it says there is an indentation error - on running
python -m pytest -v --doctest zarr
, it says Expected is'aaa'
but Got isaaa
, yet I have it as'aaa'