pyuvdata Build script for downselection on frequencies, polarizations, lsts, etc.

Build script for downselection on frequencies, polarizations, lsts, etc.

Open jsdillon opened this issue 5 years ago • 0 comments

For the upcoming HERA season, we're going to need the RTP to perform a data downselection for internet transfer off site. We need a python script for performing this (which we can then build a shell wrapper around for OPM) and I think pyuvdata is reasonable place for it to live... though I'm open to other suggestions.

Here are the features I think we'll need:

Ability to take in a raw data file using partial I/O to save memory (probably it's easiest to do partial I/O over groups of baselines)
Ability to take in a UVFlag waterfall file, select on times to match the file (if the flags are for the whole night), and then apply those flags to the data.
Perform downselection on the following axes:
- LST: given a list of pairs of floats that describe LST ranges, only keep data in those ranges
- Solar altitude: keep only times when the sun is below some altitude parameter
- Band: allow for a reduced band (or maybe subbands via spectral windows, though I don't personally need need that)
- Polarization: allow for a reduction of polarizations from 4 pol to any subset of those four
- Frequency Resolution: allow for a reduction of frequency resolution by binning together an integer number of channels. (Question: does this necessitate rephasing?)
- Temporal Resolution: allow for data reduction via a common temporal resolution, or a reduction of temporal resolution by an integer factor (Question: does this necessitate rephasing? Probably...)
- Baseline length: allow for a cut on minimum/maximum baseline length. (We might also want a separate cut on minimum/maximum EW length, but I'm fine punting on that)
- Only keep autos or crosses
- Others?
Write out to uvh5 using partial i/o

For both temporal/frequency downsampling, I think we should also decide how flags get handled. I can see a two two ways to do this:

Average/sum all data that goes into a single freq/time bin, but if any of the bins that get combined are flagged, flag the whole bin.
Average/sum only unflagged data, but keep track of how many unflagged samples went into each bin via nsamples. If nsamples == 0 then also flag. I think the user should be able to choose which of those two is used.

Thoughts on this design @bhazelton @nithyanandan?

Jul 31 '20 01:07 jsdillon

pyuvdata pyuvdata copied to clipboard

Build script for downselection on frequencies, polarizations, lsts, etc.

pyuvdata
pyuvdata copied to clipboard