register icon indicating copy to clipboard operation
register copied to clipboard

Optimise Transformer PSC

Open tiredpixel opened this issue 1 year ago • 0 comments

Transformation is very slow. It takes around 2.03 s/statement (*):

I, [2024-04-30T12:57:31.688140 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672393859288357183009020135870562338] /company/15694555/persons-with-significant-control/individual/7AYbJehiTTD91IcUhbXdBvumeaY
I, [2024-04-30T12:57:33.697593 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672393859531351272751560599986503714] /company/12462557/persons-with-significant-control/individual/jjdgflZphXeM0pPxRkQkZs5kbHM
I, [2024-04-30T12:57:35.710237 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672393860093501778872363166224875554] /company/02496500/persons-with-significant-control/individual/3QzT_0qRrYgG6E4rTdpVpMEWENQ
I, [2024-04-30T12:57:38.723755 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672393860683457578844302203481489442] /company/NI696959/persons-with-significant-control/individual/qV-mSBMeZPPSXn2mGU_T_hcZjlA
I, [2024-04-30T12:57:40.731559 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672393861147685093576319806568661026] /company/NI047824/persons-with-significant-control/individual/h0i1_kzZJGYCUHwlUbezCFHf9NI
I, [2024-04-30T12:57:43.743871 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672393861385843480040401753985777698] /company/SC592338/persons-with-significant-control/individual/CPAtOXYPPh2bprJsArvAXScn288
I, [2024-04-30T12:57:45.757287 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672393861905681582474692299109433378] /company/08491828/persons-with-significant-control/individual/2p9zF0twy2ehnKfDkJOWeWtCbBc
I, [2024-04-30T12:57:48.775943 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672393862146257820578003573595439138] /company/15694559/persons-with-significant-control/individual/tKng5vVsd8XjFteN4dFDy0ej68M
I, [2024-04-30T12:57:50.793828 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672393862406176871795148846157266978] /company/11325059/persons-with-significant-control/individual/VQyyTGV40yj-uBRvDEC4MGQhb_M
I, [2024-04-30T12:57:53.815453 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394004352201981846837448732049442] /company/12208333/persons-with-significant-control/individual/s_E1ELb-HXARDTizpSdoT88hz0g
I, [2024-04-30T12:57:55.836929 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394004890173971575347431476297762] /company/15694561/persons-with-significant-control/individual/AokX43_Re-u54us-uib6l800pSs
I, [2024-04-30T12:57:57.860308 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394005883910995298572613084774434] /company/15694565/persons-with-significant-control/individual/M51kbMDneeZlVlBLkKP6XjSf160
I, [2024-04-30T12:57:59.879818 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394007039644078850158172823355426] /company/15547479/persons-with-significant-control/individual/8Znvy7tV9M7VuyBRluVSs8Cbdis
I, [2024-04-30T12:58:00.744422 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394007555855403825604830422892578] /company/15512467/persons-with-significant-control/individual/q-O8qcELY7atL9GtbCzi4L2IhHQ
I, [2024-04-30T12:58:02.884764 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394007852042229631188978225905698] /company/15545523/persons-with-significant-control/individual/iXEklbEKlTUZswR1QqhBCYtcKYE
I, [2024-04-30T12:58:03.729477 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394008402103477555845252717215778] /company/15547510/persons-with-significant-control/individual/JoGuhDIatSMKhdRakaOH5P4Lmos
I, [2024-04-30T12:58:04.429314 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394009458704643899031151410413602] /company/15563627/persons-with-significant-control/individual/Yhb4Ev3VME8uaphRpnLByDgeknM
I, [2024-04-30T12:58:05.156642 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394010336384788939252000966574114] /company/15603805/persons-with-significant-control/individual/fTFm-H4_iJs4CVsEEoDEiLwVhFU
I, [2024-04-30T12:58:05.877892 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394010674884018431348169884303394] /company/15607311/persons-with-significant-control/individual/wvtX8efpM-Ehsj7vGcR5RwF-0IU
I, [2024-04-30T12:58:08.909962 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394147026000435306532951918903330] /company/14796746/persons-with-significant-control/individual/95cXqZ-14aTkGpC7jjU8MJ0xRJE
I, [2024-04-30T12:58:10.933029 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394147864994954119085599164989474] /company/15694572/persons-with-significant-control/individual/jT3EiIgt0BbS5p2Dq6G0ABOdeKI
I, [2024-04-30T12:58:12.954265 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394148082601601649718850612101154] /company/15694574/persons-with-significant-control/individual/zktlC6GFce-khAoOygiG-5d_K0I
I, [2024-04-30T12:58:14.976031 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394286094782094675404118692986914] /company/09568573/persons-with-significant-control/individual/Y5zBOJaAmDfJ35r9LUJCXB_L86k
I, [2024-04-30T12:58:16.993920 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394286435699175806729545960128546] /company/15135615/persons-with-significant-control/individual/6KP-hx2juhDTdhlodF7RM-jYLXk
I, [2024-04-30T12:58:19.011294 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394286950701574962561574384959522] /company/15694579/persons-with-significant-control/individual/JcnmYPWoI3AYo4MdpJM1PaQQXXs
D, [2024-04-30T12:58:22.441569 #1] DEBUG -- : [shardId-000000000000] LAG: 86400s | N: 0

(*): Note that these measurements are from a development environment on a small Elasticsearch instance, not a production Elasticsearch cluster.

I think this has most likely always been the case, although it's possible that things are currently running slower than they used to (I am not sure why, if so). There could be a mixture of reasons for this, including: design, extensive work within Ruby code, Elasticsearch roundtrip, Elasticsearch write-confirmations.

Given our current design and performance characteristics, this means we couldn't cope with even 1 event/s from upstream PSC datasource, and that if the transformation stream falls behind, catching up could take an exceedingly long time.

It's worth spending some time trying to see if there are any simple performance improvements which could be made, here. Anything more complex would likely be too risky or time-consuming to change, but if some small improvements could be found here or there (or perhaps some regression discovered and fixed), even a 0.5 s/event increase in speed would compound to our benefit.

Although this prioritises optimisation of the PSC data transformation pipeline only, any benefits would likely also be applicable to SK and DK pipelines.

This was already known about from bulk data imports, but work on https://github.com/openownership/register/issues/251 made this more obvious and easier to measure.

tiredpixel avatar Apr 30 '24 13:04 tiredpixel