sqllogictest-rs icon indicating copy to clipboard operation
sqllogictest-rs copied to clipboard

idea: add syntax for session

Open xxchan opened this issue 2 years ago • 7 comments

Use case: a user wants to setup a dataset (e.g., tpch), then tests different set of queries. What's more, they want to test it under different session configurations. e.g., https://github.com/singularity-data/risingwave/issues/3629#issuecomment-1174502552

Previously, they will:

# A.slt
include prepare.slt.part
set CONFIG=A;  
include test1.slt.part
include test2.slt.part
# B.slt
include prepare.slt.part
set CONFIG=B;  
include test1.slt.part
include test2.slt.part

The biggest problem is that the tests cannot be parallized.

I came up with another syntax:

include prepare.slt.part
session {
  statement ok
  set CONFIG=A;  
  include test1.slt.part
}
session {
  statement ok
  set CONFIG=A;  
  include test2.slt.part
}
session {
  statement ok
  set CONFIG=B;  
  include test1.slt.part
}
session {
  statement ok
  set CONFIG=B;  
  include test2.slt.part
}

In this way, we can have:

  • parallism
  • isolated session config. This also solves another problem: config side-effects in a subtest .slt.part https://github.com/risinglightdb/sqllogictest-rs/issues/55
  • isolated failure
  • reuse test cases

session can be alternatived called as e.g., subtest or run.

xxchan avatar Aug 21 '22 21:08 xxchan

I strongly recommend to keep things simple 😇 How about just use one session for one .slt file? Users can split tests into multiple files if necessary.

fuyufjh avatar Aug 25 '22 16:08 fuyufjh

I strongly recommend to make things simple... How about just use one session for one file?

They are different problems. #85

xxchan avatar Aug 25 '22 16:08 xxchan

So let's do https://github.com/risinglightdb/sqllogictest-rs/issues/85 first?

fuyufjh avatar Aug 25 '22 16:08 fuyufjh

Users can split tests into multiple files if necessary.

include's semantic is just copy-paste.

That sounds intuitive to me (and other users, I think). I don't think you need do anything else for it.

Yes, but when we want to split tests, we failed to do so because we also want to reuse tests...

And test cases should be as short as possible. I believe current RisingWave’s batch test is a bad example. Include should be used to include things in common, instead of including new test cases. This is where the confusion originates: each test case share a different session level setting. By the way, parallelism is exploited on file level, which means that sqllogictest cannot parallelize RisingWave’s batch test.

Originally posted by @skyzh in https://github.com/risinglightdb/sqllogictest-rs/issues/55#issuecomment-1172832193

Originally posted by @skyzh in https://github.com/singularity-data/risingwave/issues/3629#issuecomment-1173869684

The reason we include test cases is that we want to test both cases for local and distributed mode. I'm not aware of better ways to avoid much code duplication. Any suggestion is welcome.

Originally posted by @liurenjie1024 in https://github.com/singularity-data/risingwave/issues/3629#issuecomment-1174502552

So this issue proposes another thing (which describes subtests, instead of general copy-paste) other than include for this use case. To be honest, I also think it's bit of complicated, but can't come up with another way 😇. Anyway it's just an idea.

xxchan avatar Aug 25 '22 17:08 xxchan

Risingwave batch local_mode.slt and distribution_mode.slt includes ALL subtests. It seems that we can't have any improvements without additional mechanism 🤪

xxchan avatar Aug 25 '22 17:08 xxchan

crdb's sqllogictest has a subtest

https://github.com/cockroachdb/cockroach/blob/82e2a30c5be8046dead49348722a2429d5c29290/pkg/sql/logictest/logic.go#L377-L380

It's like

subtest a
...
subtest end (optional)

subtest b
...

BTW they don't have include

xxchan avatar Aug 26 '22 09:08 xxchan

Hey there! I'd like to share an idea that could potentially help you achieve side-effect isolation.

In Datafusion we implemented a runner and a Postgres client, that assume:

  • there is one connection per test file - this is the same as in the sqllogictest-rs Postgres implementations.
  • each connection overrides the Postgres' search_path. search_path tells in which schemas to find database objects. The schema name is based on the test file name.

Having a schema per unit of test execution (a file name, a session block, etc.) would at least prevent parallel tests from trying to perform DDL operations on the same db objects.

melgenek avatar Jan 28 '23 22:01 melgenek