Implement RecomputeKMeans scan at the data shard side (#19154)
Changelog category
- Not for changelog (changelog entry is not required)
Description for reviewers
:green_circle: 2025-06-19 11:43:06 UTC The validation of the Pull Request description is successful.
:white_circle: 2025-06-19 11:44:14 UTC Pre-commit check linux-x86_64-relwithdebinfo for b0e90c82fa22b9ecff7469d0e4b240c0842822db has started.
:white_circle: 2025-06-19 11:46:12 UTC Artifacts will be uploaded here
:white_circle: 2025-06-19 11:50:26 UTC ya make is running...
:yellow_circle: 2025-06-19 13:20:14 UTC Some tests failed, follow the links below. Going to retry failed tests...
Test history | Ya make output | Test bloat
| TESTS | PASSED | ERRORS | FAILED | SKIPPED | MUTED? |
|---|---|---|---|---|---|
| 38688 | 35982 | 0 | 5 | 2664 | 37 |
:white_circle: 2025-06-19 13:23:37 UTC ya make is running... (failed tests rerun, try 2)
:yellow_circle: 2025-06-19 13:37:58 UTC Some tests failed, follow the links below. Going to retry failed tests...
Test history | Ya make output | Test bloat | Test bloat
| TESTS | PASSED | ERRORS | FAILED | SKIPPED | MUTED? |
|---|---|---|---|---|---|
| 521 (only retried tests) | 483 | 0 | 1 | 3 | 34 |
:white_circle: 2025-06-19 13:38:08 UTC ya make is running... (failed tests rerun, try 3)
:red_circle: 2025-06-19 13:50:47 UTC Some tests failed, follow the links below.
Test history | Ya make output | Test bloat | Test bloat | Test bloat
| TESTS | PASSED | ERRORS | FAILED | SKIPPED | MUTED? |
|---|---|---|---|---|---|
| 321 (only retried tests) | 288 | 0 | 1 | 1 | 31 |
:green_circle: 2025-06-19 13:50:55 UTC Build successful.
:yellow_circle: 2025-06-19 13:51:19 UTC ydbd size 2.2 GiB changed* by +309.8 KiB, which is >= 100.0 KiB vs main: Warning
| ydbd size dash | main: 3559a1f6ca846e99682e83d699ef89748d7b8b74 | merge: b0e90c82fa22b9ecff7469d0e4b240c0842822db | diff | diff % |
|---|---|---|---|---|
| ydbd size | 2 375 957 328 Bytes | 2 376 274 552 Bytes | +309.8 KiB | +0.013% |
| ydbd stripped size | 497 918 472 Bytes | 497 968 968 Bytes | +49.3 KiB | +0.010% |
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation
:white_circle: 2025-06-19 11:44:17 UTC Pre-commit check linux-x86_64-release-asan for b0e90c82fa22b9ecff7469d0e4b240c0842822db has started.
:white_circle: 2025-06-19 11:47:18 UTC Artifacts will be uploaded here
:white_circle: 2025-06-19 11:52:06 UTC ya make is running...
:yellow_circle: 2025-06-19 13:59:09 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet Going to retry failed tests...
Test history | Ya make output | Test bloat
| TESTS | PASSED | ERRORS | FAILED | SKIPPED | MUTED? |
|---|---|---|---|---|---|
| 16292 | 15947 | 0 | 127 | 190 | 28 |
:white_circle: 2025-06-19 14:00:33 UTC ya make is running... (failed tests rerun, try 2)
:yellow_circle: 2025-06-19 14:39:22 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet Going to retry failed tests...
Test history | Ya make output | Test bloat | Test bloat
| TESTS | PASSED | ERRORS | FAILED | SKIPPED | MUTED? |
|---|---|---|---|---|---|
| 1848 (only retried tests) | 1571 | 0 | 68 | 183 | 26 |
:white_circle: 2025-06-19 14:39:42 UTC ya make is running... (failed tests rerun, try 3)
:yellow_circle: 2025-06-19 15:13:38 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet
Test history | Ya make output | Test bloat | Test bloat | Test bloat
| TESTS | PASSED | ERRORS | FAILED | SKIPPED | MUTED? |
|---|---|---|---|---|---|
| 1602 (only retried tests) | 1336 | 0 | 63 | 174 | 29 |
:green_circle: 2025-06-19 15:13:55 UTC Build successful.
:yellow_circle: 2025-06-19 15:14:32 UTC ydbd size 3.9 GiB changed* by +514.0 KiB, which is >= 100.0 KiB vs main: Warning
| ydbd size dash | main: 3559a1f6ca846e99682e83d699ef89748d7b8b74 | merge: b0e90c82fa22b9ecff7469d0e4b240c0842822db | diff | diff % |
|---|---|---|---|---|
| ydbd size | 4 179 408 072 Bytes | 4 179 934 400 Bytes | +514.0 KiB | +0.013% |
| ydbd stripped size | 1 448 934 136 Bytes | 1 449 086 584 Bytes | +148.9 KiB | +0.011% |
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation
:white_circle: 2025-06-19 16:25:15 UTC Pre-commit check linux-x86_64-release-asan for 82213f6553a36a5b25cc8115f5854607728a3e29 has started.
:white_circle: 2025-06-19 16:25:27 UTC Artifacts will be uploaded here
:white_circle: 2025-06-19 16:29:23 UTC ya make is running...
:white_circle: 2025-06-19 16:25:15 UTC Pre-commit check linux-x86_64-relwithdebinfo for 82213f6553a36a5b25cc8115f5854607728a3e29 has started.
:white_circle: 2025-06-19 16:25:27 UTC Artifacts will be uploaded here
:white_circle: 2025-06-19 16:29:22 UTC ya make is running...
:white_circle: 2025-06-19 16:32:31 UTC Pre-commit check linux-x86_64-relwithdebinfo for 1daf3eba01956dfa089b717ad82df0aeb9d920f2 has started.
:white_circle: 2025-06-19 16:32:43 UTC Artifacts will be uploaded here
:white_circle: 2025-06-19 16:36:49 UTC ya make is running...
:yellow_circle: 2025-06-19 18:26:54 UTC Some tests failed, follow the links below. Going to retry failed tests...
Test history | Ya make output | Test bloat
| TESTS | PASSED | ERRORS | FAILED | SKIPPED | MUTED? |
|---|---|---|---|---|---|
| 38697 | 35984 | 0 | 10 | 2664 | 39 |
:white_circle: 2025-06-19 18:30:25 UTC ya make is running... (failed tests rerun, try 2)
:yellow_circle: 2025-06-19 18:43:51 UTC Some tests failed, follow the links below. Going to retry failed tests...
Test history | Ya make output | Test bloat | Test bloat
| TESTS | PASSED | ERRORS | FAILED | SKIPPED | MUTED? |
|---|---|---|---|---|---|
| 1508 (only retried tests) | 1451 | 0 | 1 | 22 | 34 |
:white_circle: 2025-06-19 18:44:08 UTC ya make is running... (failed tests rerun, try 3)
:red_circle: 2025-06-19 18:55:38 UTC Some tests failed, follow the links below.
Test history | Ya make output | Test bloat | Test bloat | Test bloat
| TESTS | PASSED | ERRORS | FAILED | SKIPPED | MUTED? |
|---|---|---|---|---|---|
| 467 (only retried tests) | 436 | 0 | 1 | 0 | 30 |
:green_circle: 2025-06-19 18:55:46 UTC Build successful.
:yellow_circle: 2025-06-19 18:56:10 UTC ydbd size 2.2 GiB changed* by +308.7 KiB, which is >= 100.0 KiB vs main: Warning
| ydbd size dash | main: 911050b753e7fb532f13324f65d2a167e8045e37 | merge: 1daf3eba01956dfa089b717ad82df0aeb9d920f2 | diff | diff % |
|---|---|---|---|---|
| ydbd size | 2 376 264 336 Bytes | 2 376 580 472 Bytes | +308.7 KiB | +0.013% |
| ydbd stripped size | 497 966 248 Bytes | 498 016 648 Bytes | +49.2 KiB | +0.010% |
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation
:white_circle: 2025-06-20 11:55:39 UTC Pre-commit check linux-x86_64-relwithdebinfo for ac13b260003a4439db1067cc94255809c4c28bb7 has started.
:white_circle: 2025-06-20 11:55:51 UTC Artifacts will be uploaded here
:white_circle: 2025-06-20 11:59:49 UTC ya make is running...
:yellow_circle: 2025-06-20 14:00:22 UTC Some tests failed, follow the links below. Going to retry failed tests...
Test history | Ya make output | Test bloat
| TESTS | PASSED | ERRORS | FAILED | SKIPPED | MUTED? |
|---|---|---|---|---|---|
| 38700 | 35994 | 0 | 2 | 2666 | 38 |
:white_circle: 2025-06-20 14:03:57 UTC ya make is running... (failed tests rerun, try 2)
:green_circle: 2025-06-20 14:15:54 UTC Tests successful.
Test history | Ya make output | Test bloat | Test bloat
| TESTS | PASSED | ERRORS | FAILED | SKIPPED | MUTED? |
|---|---|---|---|---|---|
| 404 (only retried tests) | 372 | 0 | 0 | 4 | 28 |
:green_circle: 2025-06-20 14:16:05 UTC Build successful.
:yellow_circle: 2025-06-20 14:16:27 UTC ydbd size 2.2 GiB changed* by +308.8 KiB, which is >= 100.0 KiB vs main: Warning
| ydbd size dash | main: ce274a43eeccc20b30ffb022268320258c813209 | merge: ac13b260003a4439db1067cc94255809c4c28bb7 | diff | diff % |
|---|---|---|---|---|
| ydbd size | 2 376 655 024 Bytes | 2 376 971 224 Bytes | +308.8 KiB | +0.013% |
| ydbd stripped size | 498 045 256 Bytes | 498 095 720 Bytes | +49.3 KiB | +0.010% |
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation
:white_circle: 2025-06-20 11:55:41 UTC Pre-commit check linux-x86_64-release-asan for ac13b260003a4439db1067cc94255809c4c28bb7 has started.
:white_circle: 2025-06-20 11:55:52 UTC Artifacts will be uploaded here
:white_circle: 2025-06-20 11:59:50 UTC ya make is running...
:yellow_circle: 2025-06-20 14:34:13 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet Going to retry failed tests...
Test history | Ya make output | Test bloat
| TESTS | PASSED | ERRORS | FAILED | SKIPPED | MUTED? |
|---|---|---|---|---|---|
| 16302 | 15936 | 0 | 138 | 198 | 30 |
:white_circle: 2025-06-20 14:35:44 UTC ya make is running... (failed tests rerun, try 2)
:yellow_circle: 2025-06-20 15:16:57 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet Going to retry failed tests...
Test history | Ya make output | Test bloat | Test bloat
| TESTS | PASSED | ERRORS | FAILED | SKIPPED | MUTED? |
|---|---|---|---|---|---|
| 2217 (only retried tests) | 1940 | 0 | 72 | 180 | 25 |
:white_circle: 2025-06-20 15:17:20 UTC ya make is running... (failed tests rerun, try 3)
:yellow_circle: 2025-06-20 15:50:43 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet
Test history | Ya make output | Test bloat | Test bloat | Test bloat
| TESTS | PASSED | ERRORS | FAILED | SKIPPED | MUTED? |
|---|---|---|---|---|---|
| 1610 (only retried tests) | 1347 | 0 | 68 | 172 | 23 |
:green_circle: 2025-06-20 15:51:00 UTC Build successful.
:yellow_circle: 2025-06-20 15:51:34 UTC ydbd size 3.9 GiB changed* by +1000.3 KiB, which is >= 100.0 KiB vs main: Warning
| ydbd size dash | main: 614aa01a05784522b0fd706b73301924765690f2 | merge: ac13b260003a4439db1067cc94255809c4c28bb7 | diff | diff % |
|---|---|---|---|---|
| ydbd size | 4 180 197 304 Bytes | 4 181 221 600 Bytes | +1000.3 KiB | +0.025% |
| ydbd stripped size | 1 449 180 920 Bytes | 1 449 471 832 Bytes | +284.1 KiB | +0.020% |
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation
Упали тесты векторного индекса в kqp_scheme_ut из-за несоответствия длин векторов в индексируемой колонке и параметров индекса. Там создавали векторный индекс по полю Name с именами типа Anna, Joshua и т.п., при этом использовали параметры: тип - float, длина вектора - 1024.
Сначала стал исправлять TSampleKMeansScan, добавляя в него пропуск векторов с некорректным размером. Но не захотелось это изменение замешивать в данном ПР. Поэтому просто поправил тестовые данные и параметры индекса - сделал все "имена" длиной 4 байта и параметры вектора - uint8 и длина вектора 3 - это как раз 4 байта, 1 лишний - это байт с типом.
Раньше, кстати, в подобных ситуациях вообще мог быть overflow. Там в ReshuffleKMeansScan раньше принимались кластеры со строками любой длины, а дальше в FindCluster искался кластер, используя cluster.data() без проверки границ. Скорее всего, вообще мог быть выход за границы массива.
Кроме того, на самом деле я вообще считаю неправильным позволять строить векторный индекс по некорректным данным. Ранее это свободно разрешалось, а поведение при этом было неопределённым - при дальнейших поисках такие данные могли либо не найтись вообще, либо свалиться все в 1 кластер и всё-таки найтись.
Но тут надо отдельно обсудить и поправить - тикет на это я уже создавал, т.к. уже ранее нарывался при тестировании: https://github.com/ydb-platform/ydb/issues/18667
Кроме того, на самом деле я вообще считаю неправильным позволять строить векторный индекс по некорректным данным.
а если построили по пустой таблице, потом вставляют не то?)
Кроме того, на самом деле я вообще считаю неправильным позволять строить векторный индекс по некорректным данным.
а если построили по пустой таблице, потом вставляют не то?)
ну как бы, по-хорошему, колонка вообще должна быть типизированная и не давать вставить "не то")