spark-fits
spark-fits copied to clipboard
Relationship between SparkFits dataframe and source fits file?
Just started looking at SparkFits for a large scale data modelling and analysis project for radio astronomy - we'll be dealing with very large image cubes - (currently testing on a 350gb Stokes 3D fits cube but will be testing further on 1TB - multi TB fits files in future).
Apologies if this is a fairly simple question, but the documentation isn't very clear - how does SparkFits instantiate the data frame from a 3D cube? ie we have an experimental 3D image cube (~350 TB), covering a sky coordinate area of ~ 10 x 10 degrees (Right Ascension and Declination) over ~2590 frequency channels. Pixel measurements are in Hz. The cube dimensions - Ra 5,607, dec 5,694 and Freq channels 2,592 so the cube shape (Ra, Dec, Freq) is (5607, 5692, 2592).
The data frame has 14,544,168 records, each image record is an array of 5607 elements, so if the data frame was expressed as a 2D matrix, it would have the shape of (14544168, 5607).
Each value in each image array is a pixel value in Hz for a specific Ra, Dec and Frequency channel, correct?
Given that the image column in each row is 5607 elements, is it correct to assume each row represents the pixel values for all Ra positions for one specific Dec and Frequency channel value?
Assuming the above is correct, an image for one specific frequency channel would be a 2D array (dec, Ra) of (5694, 5607) - within the SparkFits data frame, how would this be extracted? Rows 1-5694 for one specific frequency channel, subsequent groups of 5694 rows for subsequent frequency channels?
And is it possible to extract the actual values for the Ra, Dec and Frequency values from the data frame, or does this need to be pulled from the fits header?
Again, apologies if these are simple questions - and thanks for your time.