llmware
llmware copied to clipboard
Reduce loops in input ingestion comparison
Here we change the flow a little to get rid of really large loops. In the original there is 3 loops 1 loop the distinct result from the store to create a array with only file names and no path 2 loop is file_list times the 3 loop 3 loop is distinct_files times
in my case it became 1 loop 700000 times 2 loop 5000 times 3 loop 700000 times so the combination of loop 2 and 3 is a larger loop of 5000*700000 = 3.500.000.000 thats 3.5 billion.....
what I did is in loop 1 I already reduce the output to just the data needed for this iteration which is such files that are in the store and in file_list
then there is no loop 2 and no loop 3 as now I just substract the found_list from the file_list and that will give us not_found_list
this is a big win in computation time.
additionaly I added a shortcut at the beginning so that if file_list is just empty we do nothing and return empty arrays as that would be the result anyways
My guess is the same would also work for input_ingestion_comparison_from_parser_state but since I could not test that one as I have no usecase yet I did leave it untouched, But if someone sees this and knows the bigger picture have a look if it could work there the same way
@osi1880vr - thanks for this contribution and focus on this issue - it is an important area of optimization. I will go through it this afternoon - please give me 1-2 days, as I will run it through a lot of tests as part of integrating it into the main code base. 😄
👍