pyuvdata
pyuvdata copied to clipboard
Build script for downselection on frequencies, polarizations, lsts, etc.
For the upcoming HERA season, we're going to need the RTP to perform a data downselection for internet transfer off site. We need a python script for performing this (which we can then build a shell wrapper around for OPM) and I think pyuvdata is reasonable place for it to live... though I'm open to other suggestions.
Here are the features I think we'll need:
- Ability to take in a raw data file using partial I/O to save memory (probably it's easiest to do partial I/O over groups of baselines)
- Ability to take in a UVFlag waterfall file, select on times to match the file (if the flags are for the whole night), and then apply those flags to the data.
- Perform downselection on the following axes:
- LST: given a list of pairs of floats that describe LST ranges, only keep data in those ranges
- Solar altitude: keep only times when the sun is below some altitude parameter
- Band: allow for a reduced band (or maybe subbands via spectral windows, though I don't personally need need that)
- Polarization: allow for a reduction of polarizations from 4 pol to any subset of those four
- Frequency Resolution: allow for a reduction of frequency resolution by binning together an integer number of channels. (Question: does this necessitate rephasing?)
- Temporal Resolution: allow for data reduction via a common temporal resolution, or a reduction of temporal resolution by an integer factor (Question: does this necessitate rephasing? Probably...)
- Baseline length: allow for a cut on minimum/maximum baseline length. (We might also want a separate cut on minimum/maximum EW length, but I'm fine punting on that)
- Only keep autos or crosses
- Others?
- Write out to uvh5 using partial i/o
For both temporal/frequency downsampling, I think we should also decide how flags get handled. I can see a two two ways to do this:
- Average/sum all data that goes into a single freq/time bin, but if any of the bins that get combined are flagged, flag the whole bin.
- Average/sum only unflagged data, but keep track of how many unflagged samples went into each bin via
nsamples. Ifnsamples == 0then also flag. I think the user should be able to choose which of those two is used.
Thoughts on this design @bhazelton @nithyanandan?