Gaurav Sehgal comments

Results 18 comments of


                                            Gaurav Sehgal

trafficstars

Allow all bucketed table supported types during filter

https://github.com/trinodb/trino/issues/13600 Due to this flaky test, `plugin/trino-hive` is failing.

Scale table writers per task based on throughput

All of these test failures seem to be unrelated.

Scale table writers per task based on throughput

`ci / hive-tests (config-empty` and `ci / hive-tests (config-hdp3)` are failing due to this issue. https://github.com/trinodb/trino/issues/13270 `ci / test (plugin/trino-hive)` is failing due to OOM and `ci / test (plugin/trino-elasticsearch)`...

Scale table writers per task based on throughput

This experiment was performed to find the best default value of `task.min-scaled-writer-count`. When the value is 8: ``` trino:gaurav_test_1> Insert into gaurav_test_1.lineitem_current select * FROM tpch_sf300_orc_part.lineitem; INSERT: 1799989091 rows Query...

Scale table writers per task based on throughput

Test failures are unrelated

Scale table writers per task based on throughput

[benchmark-scale-writers.pdf](https://github.com/trinodb/trino/files/9182713/benchmark-scale-writers.pdf) For q2 and q4 the initial value of `task_writer_count` is 16. Therefore, they are a bit slower with scaling because it'll increase tasks gradually.

Scale table writers per task based on throughput

@sopel39 @raunaqmorarka PTAL again - I've fixed the race condition. Now, worker-level scaling will only happen when local tasks are fully scaled. - Instead of mixing `task.writer-count`, created separate property...

Scale table writers per task based on throughput

New benchmarks: ``` trino:gaurav_test_1> Insert into gaurav_test_1.lineitem_current select * FROM tpch_sf300_orc_part.lineitem; INSERT: 1799989091 rows Query 20220728_215651_00000_wynas, FINISHED, 7 nodes Splits: 2,800 total, 2,800 done (100.00%) 6:06 [1.8B rows, 47.5GB] [4.92M...

Scale table writers per task based on throughput

New benchmarks comparison report: [Benchmarks comparison-task-scale-writers.pdf](https://github.com/trinodb/trino/files/9219617/Benchmarks.comparison-task-scale-writers.pdf) For q2 and q4 the initial value of task_writer_count is 8. Therefore, they are a bit slower with scaling because it'll increase tasks gradually.

Scale table writers per task based on throughput

Simulation: config: ``` MAX_WORKERS = 20 TOTAL_DATA_SIZE = 4000 MAX_TASK_WRITER_COUNT = 8 MIN_WRITER_SIZE = 32 ``` Results: ``` Total Writers: 49 Small Writers: 15 Large Writers: 34 Small/Large Ratio: 0.4411764705882353...