Oleks V

Results 186 comments of Oleks V

Spark-PI example is based on in memory RDD so Comet Scan cannot be tested, we need to prepare our own test case which reads data from local source and check...

mind if I take that, already investigating https://github.com/apache/datafusion-comet/issues/1681 and I think it has similar problems

To create a reproduce test its needed to run a test in debug mode ``` async fn test_anti_join_1k_filtered() { // NLJ vs HJ gives wrong result // Tracked in https://github.com/apache/datafusion/issues/11537...

@korowa @viirya please help to understand scenario with ranges. if there is a left streamed row with join key (1) from the right side we gonna have joined buffered batches...

> > if there is a left streamed row with join key (1) from the right side we gonna have joined buffered batches where range shows what indices share the...

``` #[tokio::test] async fn test_ranges() { let left: Vec = make_staggered_batches(1); let left = vec![ RecordBatch::try_new( left[0].schema().clone(), vec![ Arc::new(Int32Array::from(vec![1])), Arc::new(Int32Array::from(vec![10])), Arc::new(Int32Array::from(vec![10])), Arc::new(Int32Array::from(vec![1000])), ], ).unwrap() ]; let right = vec![ RecordBatch::try_new(...

> > If I have a left table > > a b > > 10 20 > > and right table > > a b > > 5 20 >...

> At what point in the code you are able to observe `0..1` for the key 2? I'm running the test from https://github.com/apache/datafusion/pull/12082#issuecomment-2319361185 and debugging the `freeze_streamed` function. For batch...

> @comphead I've finally got it -- it's like in this case SMJ is trying to produce output for each join key pair (streamed-buffered) -- I guess it's how smj...

@Omega359 @alamb I tried to play with custom attributes to wrap up the documentation on top of the what @Omega359 already built. I'm experimenting with just 2 fields(description and examples)...