Corrfunc icon indicating copy to clipboard operation
Corrfunc copied to clipboard

Python utility to find the fastest combination of bin refine factors

Open manodeep opened this issue 9 years ago • 48 comments
trafficstars

Essentially the same as #91 as fixed in commit 3478f63. That commit simply needs to be copied over to all other extensions. Would be an ideal issue to solve for a newcomer to the repo.

Theory

  • [ ] DD
  • [ ] DDrppi
  • [ ] DDsmu
  • [x] wp
  • [ ] xi
  • [ ] vpf

Mocks

  • [ ] DDrppi_mocks
  • [x] DDtheta_mocks [#216]
  • [ ] DDsmu_mocks
  • [ ] vpf_mocks

For Advanced Users

  • Implement a decorator that can be applied to any of the clustering functions. This decorator would set up a triple nested loop over x/y/z-binref and runs the underlying function (passing all the parameters through). The only parameters for this decorator would be the maxbinref, nrepeats and return_runtimes. This decorator can then be applied to all remaining clustering statistics (i.e., the functions without dedicated helper routines)

manodeep avatar Oct 30 '16 08:10 manodeep

Hi @manodeep I would like to try out this issue, to get myself familiar with this repo

kakirastern avatar Feb 26 '20 08:02 kakirastern

But first of all due to the date of the original post (from 2016), I would like to check if this is still an issue.

Some background: (I have been successful in installing the software to my MacBook with clang8 I installed after purchasing the laptop, and have tested the software on it, so no issue there.) As an aside, I am looking for an issue to prepare me for the n-way matcher idea for GSoC this year. As I am new to this repo, I am thinking if this is still open then I can begin working on it.

kakirastern avatar Feb 26 '20 12:02 kakirastern

Hi @kakirastern - sure thing - please go ahead. I would recommend creating one for the Corrfunc/mocks/DDtheta_mocks to get yourself familiar with calculations with on-sky angular separations.

manodeep avatar Feb 28 '20 01:02 manodeep

CC @parejkoj @bsipocz

manodeep avatar Feb 28 '20 01:02 manodeep

Thanks @manodeep! I will begin work on the issue this weekend.

kakirastern avatar Feb 28 '20 12:02 kakirastern

@manodeep I would like to follow up on this issue with additional PRs for the remaining extensions and for the implementation of the proposed decorator.

kakirastern avatar Apr 13 '20 08:04 kakirastern

Also, would like to know if there is any recommended way to approach each of the remaining tasks in this issue? That is, is there any recommended order for their completion?

kakirastern avatar Apr 14 '20 02:04 kakirastern

@kakirastern Thanks for your continued interest - your contributions are very welcome.

Let's go for the advanced method - that will have several benefits:

  • reduces the total amount of duplicated code
  • will extend the solution to other pair-counters that may be added in the future
  • (likely) exposes you to decorators - a very powerful python programming technique

Since this contribution will be a little more advanced, I am happy to guide you through the process. For starters, here is a primer on decorators.

manodeep avatar Apr 14 '20 03:04 manodeep

Thanks @manodeep! Let me go through the primer on decorators and get back to you over the next few days.

kakirastern avatar Apr 14 '20 12:04 kakirastern

Hi @manodeep I should be ready to continue again tomorrow. Been sidetracked by something cropped up suddenly.

kakirastern avatar Apr 20 '20 15:04 kakirastern

Thank you for letting me know. Just in case this is not clear, you are doing us a favour - so please feel free to work at it on your own timescale (never mind the whole pandemic going on in the world!)

manodeep avatar Apr 21 '20 00:04 manodeep

Thanks @manodeep! The personal matter has been resolved now so I can get back to this issue. I have read the primer on decorators you have suggested earlier. I really like the idea of using decorators as an advanced method to approach the task outlined above. What would you recommend me doing next?

kakirastern avatar Apr 24 '20 12:04 kakirastern

@kakirastern Can you come up with a strategy (using decorators) that loops over the relevant angular/spatial dimensions and then finds the fastest runtime? Perhaps create some toy "decorated" functions to get a better idea about the functionality?

Think of this issue as tackling a research problem - solving it will require some exploration but there should be a solution at the end :)

manodeep avatar Apr 27 '20 01:04 manodeep

Sure @manodeep Give me some time to think about the strategy and I will get back to you soon.

kakirastern avatar Apr 27 '20 12:04 kakirastern

Hi @manodeep I have experimented using decorators with modification of the docstrings... So will move on to testing with the code part next.

kakirastern avatar May 09 '20 13:05 kakirastern

Great! Would it be easier if you open up a gist (gist.github.com/) with a Jupyter notebook? That way wemight be able to help (or ask for help as the case may be)?

manodeep avatar May 11 '20 01:05 manodeep

Sure, let me figure out how to use gist and report back shortly.

kakirastern avatar May 13 '20 14:05 kakirastern

Am thinking about a way things would work in the context of a package though, hence it's taking more time...

kakirastern avatar May 23 '20 11:05 kakirastern

Thanks for the update. Please let me know if I can help with the planning/design phase...

manodeep avatar May 24 '20 02:05 manodeep

Sure! I have been experimenting using decorators in some other Python packages... Turned out it can get quite intricate and more nuanced then the impression I got from reading the reference article alone. It took awhile for me to really get the gist of the idea behind. But I will definitely start some notebook this weekend.

kakirastern avatar May 27 '20 19:05 kakirastern

So the tentative plan I have come up with is to use a decorator to add some (common) code to each function in the list above, then add some extra code to complete the functionality for each application of the decorator... At least that's the idea I have for the project for now.

kakirastern avatar Jun 10 '20 16:06 kakirastern

Sounds good. For starters, do you mind setting up a jupyter notebook containing these three functions:

  • a function with some keywords, + with xyz_binrefs
  • a function with some keywords, + with ra/dec_binrefs
  • the decorator that accepts a function, checks which one of the two functions have been passed (based on the existence of xyz_binrefs or ra/dec_binrefs), and then runs a triple nested for-loop or double nested for-loop, while passing through all relevant keywords to the actual function

If you set up a gist, then you can use binder to run jupyter notebooks on the cloud

manodeep avatar Jun 11 '20 01:06 manodeep

Sure, will get to it as soon as possible.

kakirastern avatar Jun 15 '20 06:06 kakirastern

Hi @manodeep! After much trial and error I have drafted a version of a decorator that accepts a function, checks whether the function is with the xyz_binrefs arg or is one with the ra_dec_binrefs arg (couldn't use the slash you have used due to some syntax issue), as well as prints out all args defined in the actual function.

The gist for that work is at: https://gist.github.com/kakirastern/a905c021697200115d46e91fe95add0d.

Do let me know what you think. Thanks!

kakirastern avatar Jun 28 '20 14:06 kakirastern

@kakirastern Great work - that's fantastic.

Can you try with each of DD, DDrppi, DDsmu, wp, xi, vpf and DDrppi_mocks, DDsmu_mocks, DDtheta_mocks, vpf_mocks and probe whether each one of those functions contain xbinref/ybinref/zbinref or ra_binref/dec_binref?

The idea would be to automatically wrap a triple-for-loop (or some form of nd-iter) around the wrapped function and automatically figure out the fastest bin-ref (i.e., replace the functionsfind_fastest_DDtheta_mocks_bin_refs, find_fastest_wp_bin_refs with an automatic decorated function)

manodeep avatar Jun 28 '20 23:06 manodeep

Sure @manodeep Will try each of DD, DDrppi, DDsmu, wp, xi, vpf and DDrppi_mocks, DDsmu_mocks, DDtheta_mocks, vpf_mocks and see whether each of these contain either xbinref/ybinref/zbinref or ra_binref/dec_binref using the approach you have suggested, then get back to you soon.

kakirastern avatar Jun 30 '20 01:06 kakirastern

Perfect! And you can test that the output of this decorated function matches the output from the existing find_fastest_wp* and the find_fastest_DDtheta* routines.

Thanks again - this looks awesome!

manodeep avatar Jun 30 '20 02:06 manodeep

@manodeep Apologies for the long silence, will get back to you within this week

kakirastern avatar Jul 27 '20 13:07 kakirastern

All good :)

I did mean to ask - how did you setup the gist to connect the google-colab notebook? Looks very nice

manodeep avatar Jul 30 '20 20:07 manodeep

Oh, they have added this new option "Save a copy as GitHub Gist" under "File" in the navigation bar of Google Colab so if I opt to work there I can make a Gist simply with the click of a button.

Screenshot 2020-07-31 05 19 56

kakirastern avatar Jul 30 '20 21:07 kakirastern