vector icon indicating copy to clipboard operation
vector copied to clipboard

enhancement(topology): Speed up vector startup with big remap transforms

Open Zettroke opened this issue 9 months ago • 1 comments

We have pretty big vrl transforms in our pipeline and we started to encounter issues with how long vector takes to startup. Startup time was reaching of up to 10 minutes, which was troublesome.

The optimization consists of 2 thing.

  1. Cache vrl compilation results for remap, as it's outputs method is requiring to compile vrl and called many times. In the end it results in vrl compilation happening around 8 times for each remap.
  2. Parallelizing topology building.

Results

Test Baseline 1 thread 16 threads
big_conf_clean-single.yaml 11.40s 4.50s 4.47s
big_conf_clean-6.yaml 67.18s 27.24s 5.01s
big_conf_clean-chain.yaml 120.38s 31.75s 16.69s

Attaching this configs. (Yes, we really have 2kloc vrl 😂) big_conf_clean-single.yaml.txt big_conf_clean-6.yaml.txt big_conf_clean-chain.yaml.txt

But there is few questions/problems I have with current implementation:

  • I'm not so keen on keeping cache in RemapConfig, as it feels kinda hacky, but I'm not sure
  • I've added rayon as a dependency, I thought about hiding it behind a feature flag, but it would be really annoying.
  • I don't like pre-cache step in compile, but I don't want to introduce par_iter into every method either.
  • Maybe there should be a cli option for limiting thread count

While startup time is not the most important metric, it's troubling when it takes minutes. This optimization is extremely helpful for heavy use of vrl transforms and huge topologies, like my company's use case.

Hope we can resolve questions/problems and merge it.

Zettroke avatar May 13 '24 02:05 Zettroke

CLA assistant check
All committers have signed the CLA.

bits-bot avatar May 13 '24 02:05 bits-bot

Thanks for this @Zettroke ! We had the same observation that the VRL compilation was being repeated every time the outputs was calculated but hadn't been able to address it yet. We'll review this shortly.

jszwedko avatar May 20 '24 17:05 jszwedko

There is weird test failure for topology::schema::tests::test_expanded_definition. For some reason output doesn't get sorted. And I can't reproduce it locally image

Zettroke avatar May 20 '24 20:05 Zettroke

At a high level, this PR makes two relatively independent major changes to the remap loading code, and it would be helpful to be able to evaluate them separately. Could you untangle the compilation caching from the parallelization work? I for one would really want to address them separately.

bruceg avatar May 22 '24 14:05 bruceg

Sure thing. I'll make 2 pull requests then.

Zettroke avatar May 22 '24 15:05 Zettroke

@bruceg Can I base parallelization branch from caching branch? Or should I make them independent? Or should I wait with it until we will finalize caching?

Zettroke avatar May 23 '24 16:05 Zettroke

Make them independent if you want us to review them simultaneously, otherwise run the process with this one and the PR the other.

bruceg avatar May 23 '24 20:05 bruceg

Replaced by #20555

bruceg avatar May 30 '24 18:05 bruceg