pyuvdata icon indicating copy to clipboard operation
pyuvdata copied to clipboard

Build script for downselection on frequencies, polarizations, lsts, etc.

Open jsdillon opened this issue 5 years ago • 0 comments

For the upcoming HERA season, we're going to need the RTP to perform a data downselection for internet transfer off site. We need a python script for performing this (which we can then build a shell wrapper around for OPM) and I think pyuvdata is reasonable place for it to live... though I'm open to other suggestions.

Here are the features I think we'll need:

  • Ability to take in a raw data file using partial I/O to save memory (probably it's easiest to do partial I/O over groups of baselines)
  • Ability to take in a UVFlag waterfall file, select on times to match the file (if the flags are for the whole night), and then apply those flags to the data.
  • Perform downselection on the following axes:
    • LST: given a list of pairs of floats that describe LST ranges, only keep data in those ranges
    • Solar altitude: keep only times when the sun is below some altitude parameter
    • Band: allow for a reduced band (or maybe subbands via spectral windows, though I don't personally need need that)
    • Polarization: allow for a reduction of polarizations from 4 pol to any subset of those four
    • Frequency Resolution: allow for a reduction of frequency resolution by binning together an integer number of channels. (Question: does this necessitate rephasing?)
    • Temporal Resolution: allow for data reduction via a common temporal resolution, or a reduction of temporal resolution by an integer factor (Question: does this necessitate rephasing? Probably...)
    • Baseline length: allow for a cut on minimum/maximum baseline length. (We might also want a separate cut on minimum/maximum EW length, but I'm fine punting on that)
    • Only keep autos or crosses
    • Others?
  • Write out to uvh5 using partial i/o

For both temporal/frequency downsampling, I think we should also decide how flags get handled. I can see a two two ways to do this:

  1. Average/sum all data that goes into a single freq/time bin, but if any of the bins that get combined are flagged, flag the whole bin.
  2. Average/sum only unflagged data, but keep track of how many unflagged samples went into each bin via nsamples. If nsamples == 0 then also flag. I think the user should be able to choose which of those two is used.

Thoughts on this design @bhazelton @nithyanandan?

jsdillon avatar Jul 31 '20 01:07 jsdillon