parquet4s
parquet4s copied to clipboard
Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
POC for #333.
This change adds a pathFilter option to ParquetReader builder interface because there are some situations where users needs to configure path filter predicates(e.g. They use `_` prefix for partition columns)....
I am not sure how much effort, just asking if you would be willing and available to look over PR. It's currently stack at 1.2.1 and it's just too far...
First i want to thank you for this great library! I need to merge hundreds of small parquet files into bigger ones. Sadly they are not all the same schema...
Hi, In the akkaPekko module, I noticed that there is no handling mechanism in ParquetPartitioningFlow when calling `write` from the hadoop ParquetWriter which might throw an IOException. I thought it...
` import com.github.mjakubowski84.parquet4s.ScalaCompat.stream.scaladsl.Sink import com.github.mjakubowski84.parquet4s._ import org.apache.avro.generic.IndexedRecord import org.apache.parquet.hadoop.ParquetFileWriter.Mode import org.apache.parquet.hadoop.metadata.CompressionCodecName import org.apache.pekko.Done import org.apache.pekko.stream.scaladsl._ import scala.concurrent.Future import scala.concurrent.duration.DurationInt object AvroToParquetSink { private val writeOptions = ParquetWriter.Options(writeMode = Mode.OVERWRITE, compressionCodecName...
Implementation for the issue #351 I had to add that ClassTag stuff because sbt was complaining about type erasure on the match [here](https://github.com/mjakubowski84/parquet4s/pull/350/files#diff-1a01f20cd6d3cace915f9ae2b24ea5387cc68a30723f7a449bb2618dd0b0b703R393)
Hi, I have faced self-inconsistency with handling of projection schema. When reading partitioned parquet using `ParquetReader.projectedGeneric(expectedSchema).options(...).read` the output rows contain partitioning columns even if `expectedSchema` doesn't. When reading not-partitioned parquet...