pyuvdata icon indicating copy to clipboard operation
pyuvdata copied to clipboard

performance monitoring and benchmarking

Open dannyjacobs opened this issue 5 years ago • 5 comments

Occasionally I hear that pyuvdata is slow, though without further investigation this complaint is impossible to decouple from the size of the data being read.* However since we do not currently track execution time of our various tasks, it is possible for a change to be introduced which increases execution time. This is difficult to monitor at scale because large files and extended execution times are not easily supported within the current testing infrastructure. Here are a few possible things we could do:

  1. Monitor the execution time of all tests. This is a crude metric as we know that execution time can be affected by exogenous factors related to the underlying cloud infrastructure or install times.
  2. Monitor the execution time of specific existing tests. This would be a more precise datum than all tests which would expose things that grossly affect read time, but since the test files are small would not expose issues that scale badly with times, freqs, etc.
  3. Add tests which focus on timing but use the existing test files. This could include things like reading the file many times and averaging the read time, generating a large number of files and concat-read them, etc.
  4. Setups that require more resources. Its not clear what the break point is.

*The following is a digest of a discussion on the 3 Dec 2019 pyuvdata telecon.

dannyjacobs avatar Dec 03 '19 20:12 dannyjacobs

@dannyjacobs should this be labelled as UVData related or are you worried about other objects as well?

bhazelton avatar Dec 06 '19 02:12 bhazelton

I think just uvdata. Sorry didn’t think about labeling.

On Thu, Dec 5, 2019 at 7:53 PM Bryna Hazelton [email protected] wrote:

@dannyjacobs https://github.com/dannyjacobs should this be labelled as UVData related or are you worried about other objects as well?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/RadioAstronomySoftwareGroup/pyuvdata/issues/729?email_source=notifications&email_token=AAAPNV75BZAXV4REHJHT3FTQXG5C5A5CNFSM4JU5Z5C2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGC3HYY#issuecomment-562410467, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAPNV5OBXLRF6SNPAGEHI3QXG5C5ANCNFSM4JU5Z5CQ .

-- Sent from Gmail Mobile

dannyjacobs avatar Dec 06 '19 03:12 dannyjacobs

@mkolopanis has implemented several recent speed ups, both of the code and of the tests themselves. We do get total test suite timing from the CIs, but could add the durations keyword to get the timing of the slowest n tests.

bhazelton avatar Apr 29 '20 15:04 bhazelton

speeding up _key2_inds might also be related: #201

bhazelton avatar Apr 29 '20 15:04 bhazelton

Some other issues/PRs related to some recent speed ups: #800 #813 #815 #818 #825 #834 #840

mkolopanis avatar May 07 '20 15:05 mkolopanis