Corrfunc
Corrfunc copied to clipboard
Python utility to find the fastest combination of bin refine factors
Essentially the same as #91 as fixed in commit 3478f63. That commit simply needs to be copied over to all other extensions. Would be an ideal issue to solve for a newcomer to the repo.
Theory
- [ ]
DD - [ ]
DDrppi - [ ]
DDsmu - [x]
wp - [ ]
xi - [ ]
vpf
Mocks
- [ ]
DDrppi_mocks - [x]
DDtheta_mocks[#216] - [ ]
DDsmu_mocks - [ ]
vpf_mocks
For Advanced Users
- Implement a decorator that can be applied to any of the clustering functions. This decorator would set up a triple nested loop over
x/y/z-binrefand runs the underlying function (passing all the parameters through). The only parameters for this decorator would be themaxbinref,nrepeatsandreturn_runtimes. This decorator can then be applied to all remaining clustering statistics (i.e., the functions without dedicated helper routines)
Hi @manodeep I would like to try out this issue, to get myself familiar with this repo
But first of all due to the date of the original post (from 2016), I would like to check if this is still an issue.
Some background: (I have been successful in installing the software to my MacBook with clang8 I installed after purchasing the laptop, and have tested the software on it, so no issue there.) As an aside, I am looking for an issue to prepare me for the n-way matcher idea for GSoC this year. As I am new to this repo, I am thinking if this is still open then I can begin working on it.
Hi @kakirastern - sure thing - please go ahead. I would recommend creating one for the Corrfunc/mocks/DDtheta_mocks to get yourself familiar with calculations with on-sky angular separations.
CC @parejkoj @bsipocz
Thanks @manodeep! I will begin work on the issue this weekend.
@manodeep I would like to follow up on this issue with additional PRs for the remaining extensions and for the implementation of the proposed decorator.
Also, would like to know if there is any recommended way to approach each of the remaining tasks in this issue? That is, is there any recommended order for their completion?
@kakirastern Thanks for your continued interest - your contributions are very welcome.
Let's go for the advanced method - that will have several benefits:
- reduces the total amount of duplicated code
- will extend the solution to other pair-counters that may be added in the future
- (likely) exposes you to decorators - a very powerful python programming technique
Since this contribution will be a little more advanced, I am happy to guide you through the process. For starters, here is a primer on decorators.
Thanks @manodeep! Let me go through the primer on decorators and get back to you over the next few days.
Hi @manodeep I should be ready to continue again tomorrow. Been sidetracked by something cropped up suddenly.
Thank you for letting me know. Just in case this is not clear, you are doing us a favour - so please feel free to work at it on your own timescale (never mind the whole pandemic going on in the world!)
Thanks @manodeep! The personal matter has been resolved now so I can get back to this issue. I have read the primer on decorators you have suggested earlier. I really like the idea of using decorators as an advanced method to approach the task outlined above. What would you recommend me doing next?
@kakirastern Can you come up with a strategy (using decorators) that loops over the relevant angular/spatial dimensions and then finds the fastest runtime? Perhaps create some toy "decorated" functions to get a better idea about the functionality?
Think of this issue as tackling a research problem - solving it will require some exploration but there should be a solution at the end :)
Sure @manodeep Give me some time to think about the strategy and I will get back to you soon.
Hi @manodeep I have experimented using decorators with modification of the docstrings... So will move on to testing with the code part next.
Great! Would it be easier if you open up a gist (gist.github.com/) with a Jupyter notebook? That way wemight be able to help (or ask for help as the case may be)?
Sure, let me figure out how to use gist and report back shortly.
Am thinking about a way things would work in the context of a package though, hence it's taking more time...
Thanks for the update. Please let me know if I can help with the planning/design phase...
Sure! I have been experimenting using decorators in some other Python packages... Turned out it can get quite intricate and more nuanced then the impression I got from reading the reference article alone. It took awhile for me to really get the gist of the idea behind. But I will definitely start some notebook this weekend.
So the tentative plan I have come up with is to use a decorator to add some (common) code to each function in the list above, then add some extra code to complete the functionality for each application of the decorator... At least that's the idea I have for the project for now.
Sounds good. For starters, do you mind setting up a jupyter notebook containing these three functions:
- a function with some keywords, + with
xyz_binrefs - a function with some keywords, + with
ra/dec_binrefs - the decorator that accepts a function, checks which one of the two functions have been passed (based on the existence of
xyz_binrefsorra/dec_binrefs), and then runs a triple nested for-loop or double nested for-loop, while passing through all relevant keywords to the actual function
If you set up a gist, then you can use binder to run jupyter notebooks on the cloud
Sure, will get to it as soon as possible.
Hi @manodeep! After much trial and error I have drafted a version of a decorator that accepts a function, checks whether the function is with the xyz_binrefs arg or is one with the ra_dec_binrefs arg (couldn't use the slash you have used due to some syntax issue), as well as prints out all args defined in the actual function.
The gist for that work is at: https://gist.github.com/kakirastern/a905c021697200115d46e91fe95add0d.
Do let me know what you think. Thanks!
@kakirastern Great work - that's fantastic.
Can you try with each of DD, DDrppi, DDsmu, wp, xi, vpf and DDrppi_mocks, DDsmu_mocks, DDtheta_mocks, vpf_mocks and probe whether each one of those functions contain xbinref/ybinref/zbinref or ra_binref/dec_binref?
The idea would be to automatically wrap a triple-for-loop (or some form of nd-iter) around the wrapped function and automatically figure out the fastest bin-ref (i.e., replace the functionsfind_fastest_DDtheta_mocks_bin_refs, find_fastest_wp_bin_refs with an automatic decorated function)
Sure @manodeep Will try each of DD, DDrppi, DDsmu, wp, xi, vpf and DDrppi_mocks, DDsmu_mocks, DDtheta_mocks, vpf_mocks and see whether each of these contain either xbinref/ybinref/zbinref or ra_binref/dec_binref using the approach you have suggested, then get back to you soon.
Perfect! And you can test that the output of this decorated function matches the output from the existing find_fastest_wp* and the find_fastest_DDtheta* routines.
Thanks again - this looks awesome!
@manodeep Apologies for the long silence, will get back to you within this week
All good :)
I did mean to ask - how did you setup the gist to connect the google-colab notebook? Looks very nice
Oh, they have added this new option "Save a copy as GitHub Gist" under "File" in the navigation bar of Google Colab so if I opt to work there I can make a Gist simply with the click of a button.