unicc
unicc copied to clipboard
Performance issues
Trying to build the grammar from postgresql mentioned here https://github.com/phorward/unicc/issues/22#issuecomment-984550196 it takes a long time to generate the parser and looking at the profile it seems that most of the time is spent on size_t plist_union( plist* all, plist* from )
(see bellow) that is doing a naive exponential comparison/insertion between two lists.
Looking through the code I think that creating a sorted list first and then doing a binary search on it to decide to push/add
one list element could improve things (not ideal but it's something), but I'm not sure about it because I don't know if the lists can have duplicates.
Any thought about this ?
/usr/bin/time unicc-nb postgresql-13.3.y.par
104.40user 0.10system 1:44.57elapsed 99%CPU (0avgtext+0avgdata 135872maxresident)k
152inputs+11096outputs (0major+34371minor)pagefaults 0swaps
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
10.95 8.67 8.67 1398180161 0.00 0.00 plist_compare
10.50 16.98 8.31 12371648 0.00 0.00 plist_union
9.83 24.76 7.78 3259644116 0.00 0.00 plist_next
9.61 32.37 7.61 2294100 0.00 0.00 close_item
7.97 38.68 6.31 1 6.31 6.38 detect_default_productions
7.06 44.27 5.59 8438 0.00 0.01 lalr1_closure
5.59 48.69 4.43 2272182630 0.00 0.00 plist_access
5.48 53.03 4.34 1343718440 0.00 0.00 pccl_count
5.00 56.99 3.96 6007214 0.00 0.00 list_push
3.27 59.58 2.59 209994052 0.00 0.00 list_count
2.69 61.71 2.13 113372032 0.00 0.00 plistel_drop
2.30 63.53 1.82 plist_key
2.08 65.18 1.65 582855415 0.00 0.00 pccl_intersect
1.91 66.69 1.51 3306 0.00 0.00 pregex_dfa_from_nfa
1.63 67.98 1.29 1223780 0.00 0.00 list_find
1.48 69.15 1.18 78921959 0.00 0.00 test_same_kernel
1.26 70.15 1.00 586092893 0.00 0.00 pccl_compat
1.12 71.04 0.89 1 0.89 3.53 check_regex_anomalies
0.86 71.72 0.68 11977201 0.00 0.00 plist_erase
0.84 72.38 0.67 113382156 0.00 0.00 plist_insert
0.78 73.00 0.62 27035168 0.00 0.00 pccl_testrange
0.75 73.59 0.59 179834200 0.00 0.00 sort_symbols
0.57 74.04 0.45 855253 0.00 0.00 pregex_ptn_to_nfa
0.44 74.39 0.35 plist_set_printfn
0.40 74.71 0.32 11081 0.00 0.00 plist_subsort
0.37 75.00 0.29 35355186 0.00 0.00 plist_remove
0.35 75.28 0.28 130690767 0.00 0.00 pmalloc
0.32 75.53 0.25 247164803 0.00 0.00 pfree
0.30 75.77 0.24 10357446 0.00 0.00 plist_get_by_ptr
0.26 75.97 0.21 154645260 0.00 0.00 plist_first
0.24 76.16 0.19 4939448 0.00 0.00 pccl_create
0.24 76.35 0.19 1447245 0.00 0.00 pccl_compare
0.20 76.51 0.16 124362026 0.00 0.00 plist_last
Again looking through the code I can see that instead of using a BitArray
for computing first/follow
sets it's using plist
.
Probably plist
is fine to use when parsing the grammar but then converting then to arrays and doing all processing on then would improve the performance.
Hello @mingodad, thank you very much for your detailed report on this problem.
You're totally right, the current implementation is a naive exponential comparison of lists, mostly because the lists are self-implemented in libphorward. There was no focus on performance yet, but on usability.
There are couple of things inside unicc that could be improved. But due its very rare use, lack of time and priority on other projects, I currently don't have the time on dealing with this issues, because it might require a deeper examination to resolve the problem properly.
Looking through the code I think that creating a sorted list first and then doing a binary search on it to decide to push/add one list element could improve things (not ideal but it's something), but I'm not sure about it because I don't know if the lists can have duplicates.
Normally, there won't be any duplicates, so your attempt sounds useful. There's also the parray structure which might be useful as well for specific use-cases (this was used in the unfinished unicc2 code base in several situations to increase speed and decrease memory usage), but it requires a lot of effort to rewrite the current implementation from plist to parray.
If you have any proposal, or a quick working solution in form of a pull request, this is very welcome!
Adding a cached sorted list array the total time to parse was reduced around 40% and the profile output is now:
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
14.87 7.97 7.97 2368251 0.00 0.00 close_item
13.43 15.17 7.20 8735 0.00 0.00 lalr1_closure
11.77 21.48 6.31 1 6.31 6.37 detect_default_productions
9.86 26.77 5.29 6222279 0.00 0.00 list_push
6.39 30.19 3.43 208112882 0.00 0.00 list_count
6.32 33.58 3.39 2069372468 0.00 0.00 plist_access
4.91 36.21 2.63 1860430152 0.00 0.00 plist_next
3.40 38.03 1.82 65828038 0.00 0.00 bsearch_r
3.02 39.65 1.62 865039536 0.00 0.00 plist_compare
2.97 41.24 1.59 1444959779 0.00 0.00 plist_prev
2.95 42.82 1.58 1206508 0.00 0.00 list_find
2.72 44.28 1.46 34071855 0.00 0.00 plist_get_by_ptr
2.19 45.45 1.18 77750733 0.00 0.00 test_same_kernel
1.72 46.37 0.92 2109590 0.00 0.00 plist_offset
1.66 47.26 0.89 234329705 0.00 0.00 sort_symbols
1.65 48.15 0.89 390224369 0.00 0.00 cmp_ordered_list_bsearch
1.51 48.96 0.81 ordered_list_compare
0.95 49.47 0.51 73347230 0.00 0.00 plist_insert
0.76 49.88 0.41 plist_set_printfn
0.76 50.28 0.41 plist_key
0.60 50.60 0.32 73337106 0.00 0.00 plistel_drop
0.52 50.88 0.28 6008 0.00 0.00 plist_subsort
0.50 51.15 0.27 12956766 0.00 0.00 plist_union
0.49 51.41 0.26 10123214 0.00 0.00 plist_erase
0.40 51.63 0.22 1657945 0.00 0.00 recreate_ordered_list
0.34 51.81 0.18 13145292 0.00 0.00 plist_get
0.28 51.96 0.15 166448742 0.00 0.00 pfree
0.28 52.11 0.15 85400518 0.00 0.00 pmalloc
0.24 52.24 0.13 78722663 0.00 0.00 plist_first
0.22 52.36 0.12 plist_riter_access
0.21 52.47 0.11 26824786 0.00 0.00 list_free
0.20 52.57 0.11 list_dup
/usr/bin/time unicc-nb postgresql-13.3.y.par
60.34user 0.09system 1:00.44elapsed 99%CPU (0avgtext+0avgdata 140336maxresident)k
0inputs+11096outputs (0major+60347minor)pagefaults 0swaps
Profiling with with optimization now I'm getting this:
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
38.37 17.40 17.40 1 17.40 18.36 pregex_dfa_from_nfa
25.76 29.08 11.68 generate_tables
9.13 33.22 4.14 detect_default_productions
6.68 36.25 3.03 4922934 0.00 0.00 list_push
4.07 38.10 1.85 12956766 0.00 0.00 plist_union
3.37 39.63 1.53 1206508 0.00 0.00 list_find
2.94 40.96 1.34 442540698 0.00 0.00 plist_compare
2.40 42.05 1.09 1 1.09 1.11 pregex_dfa_minimize
1.55 42.76 0.71 234330865 0.00 0.00 sort_symbols
1.03 43.22 0.47 489312439 0.00 0.00 plist_access
0.82 43.59 0.37 73347229 0.00 0.00 plist_insert
0.79 43.95 0.36 11971 0.00 0.00 ordered_list_compare
0.71 44.27 0.32 10123214 0.00 0.00 plist_erase
0.64 44.56 0.29 6008 0.00 0.00 plist_subsort
0.33 44.71 0.15 1901053 0.00 0.00 plist_concat
0.13 44.77 0.06 84101173 0.00 0.00 pmalloc
0.11 44.82 0.05 8899391 0.00 0.00 seek_rhs_first
0.11 44.87 0.05 486 0.00 0.00 pregex_ptn_to_NFA
0.10 44.92 0.05 14962100 0.00 0.00 plist_count
0.10 44.96 0.05 pregex_qreplace
0.09 45.00 0.04 13080956 0.00 0.00 plist_get
0.09 45.04 0.04 build_code
0.08 45.08 0.04 plist_set_printfn
0.07 45.11 0.03 8212527 0.00 0.00 plist_get_by_ptr
0.07 45.14 0.03 2106378 0.00 0.00 plist_diff
0.07 45.17 0.03 24343 0.00 0.00 list_count
0.06 45.19 0.03 1208019 0.00 0.00 create_item
0.04 45.21 0.02 1423413 0.00 0.00 plist_init
0.04 45.23 0.02 1244475 0.00 0.00 plist_first
0.04 45.25 0.02 458244 0.00 0.00 pstrrender
0.04 45.27 0.02 57916 0.00 0.00 find_tabcol
0.03 45.29 0.02 66068 0.00 0.00 parray_compare
0.02 45.30 0.01 1037143 0.00 0.00 list_free
0.02 45.31 0.01 3306 0.00 0.00 free_state
0.02 45.32 0.01 1 0.01 0.16 _parse
0.02 45.33 0.01 pdbl_to_wcs
0.02 45.34 0.01 plist_dbgstats