spark-bigquery-connector icon indicating copy to clipboard operation
spark-bigquery-connector copied to clipboard

Reading only requested number of partitions from BQ table doesn't work

Open JD-V opened this issue 6 years ago • 8 comments

I am trying to query table which is partitioned by date field. Like this,

val prog_logs = spark.read.format("bigquery")
        .option("table", "project1:dataset.table")
        .option("filter", " date between '2019-09-10' and  '2019-09-11' ")
        .load()
        .cache()

This is reading entire table instead of only '2019-09-10' and '2019-09-11' partitions.

JD-V avatar Oct 03 '19 06:10 JD-V

@pmkc I would really appreciate if anybody can take a look at this.

JD-V avatar Oct 24 '19 06:10 JD-V

@davidrabinowitz is the new maintainer. Though I think we will need to consult with BigQuery Storage devs.

I have successfully used _PARTITION_DATE like that in a filter before.

pmkc avatar Oct 24 '19 16:10 pmkc

@JD-V I'd like to figure out what filter is actually pushed down to the storage API here. Do you have the read session ID from this query, or can you get one from a repro?

kmjung avatar Oct 24 '19 16:10 kmjung

I got this from very recent run, projects/mintreporting/locations/asia-southeast1/sessions/CAISCDlLRUpHSGlKGgJzZhoCc2k

@kmjung Let me know if this is helpful

JD-V avatar Oct 30 '19 09:10 JD-V

Sorry for the late response but for proper push down in spark 2.4 I think you need: date between date('2019-09-10') and date('2019-09-11') (i.e. explicitly cast values to date).

emkornfield avatar Mar 11 '21 22:03 emkornfield

Sorry for the late response but for proper push down in spark 2.4 I think you need: date between date('2019-09-10') and date('2019-09-11') (i.e. explicitly cast values to date).

Did it work for someone?

I've to try and it always makes the table full scan.

Are there any tips to do it?

I'm using Dataproc with Spark 3+

jesuejunior avatar Jul 01 '21 23:07 jesuejunior

Can you please share a sample code? Also, can you please set the log level to debug and attach the output? I assume you use the latest version

davidrabinowitz avatar Jul 01 '21 23:07 davidrabinowitz

@jesuejunior is it still an issue? Can you please share the log of a sample app (it shows the pushed down filters)

davidrabinowitz avatar Oct 18 '21 18:10 davidrabinowitz