Julien issues

Results 104 issues of


                                            Julien

Switch the CI to GitHub actions

...as is done for all repo in astrolab

On the multifile problem in spark-fits

The current implementation is not great for reading many files (100+). ### Current implementation, and why this is not great. The way we read and distribute the data from many...

bug

enhancement

Benchmarking on s3

I'm trying to benchmark spark-fits on s3, by internally looping over the same piece of code: ```python path = "s3a://abucket/..." fn = "afile.fits" # 700 MB for index in range(N):...

benchmark

FITS HDU image: no bijection between TFORMn value (L, A, B, J, K, ..) and `BITPIX`

In the Image HDU header, there is no mention of the data type via TFORMn value (letters: L, K, J, ...). Instead, the number of bits used per image pixel...

header

imageHDU

Add FITS header check as a user option

The PR #55 fixed a bug with the header check (the connector was checking all FITS header before starting the job. Good idea until you have +10,000 files....). The fix...

enhancement

Move to Spark Data Source v2 API

big change... New in Spark 2.3.0. Fortunately, from https://databricks.com/session/apache-spark-data-source-v2, there is no immediate plan to deprecate v1!

enhancement

API

Scala

Zipped files

We need to understand whether we can handle zipped files (that is unpack blocks in HDFS!). Is fpack doing this, or do we need to implement something new? Or do...

compression

Use fast serialization library

It would be worth investigating whether [data serialization](https://spark.apache.org/docs/latest/tuning.html#data-serialization) plays a role here.

memory

Scala

Allow users to select a subset of columns when loading data

This would dramatically speed-up the computation in some cases.

userInterface

FITSformat

Implement custom InputSplit/FileSplit classes

_Description to come_

enhancement

BlockOrSplit

Scala