posterior
posterior copied to clipboard
Improve speed of check_existing_variables when regex=TRUE
In #130 the speed when regex=FALSE was improved. When regex=TRUE @mjskay suggested:
You might be able to speed up the regex version by translating the input list of regexes into a single regex that matches if any one of them does.
Relevant comment from @Ozan147: https://github.com/stan-dev/posterior/pull/130#issuecomment-840628346
@mjskay and @jgabry (#131), I tested the single regex pattern with | yesterday but it turns out there is a size limit. Even with perl = TRUE, it starts throwing errors after certain number of variables. One idea might be to divide that large pattern string into valid-size chunks to perform multiple matches.
Moving my comment here...
Ah damn. Could also fall back to the for loop version in that case I suppose... Honestly I'd guess most use cases of the regex version have a small number of input regexes (else why use regex in the first place?)
There is a potential size limitation that would cause an issue with large number of variables to match.
?regex
Long regular expression patterns may or may not be accepted: the POSIX standard only requires up to 256 bytes.
We could still speed it up by dividing any long strings into valid chunks and running multiple matches as mentioned in #130.
Just to potentially uncomplicate our lives: is there a strong use case for 256 byte+ regex here, whether passed by the user or assembled by us internally from a list of regexes the user passes? My guess is regex = TRUE is mostly used for subsetting with a handful of expressions at most (e.g. on things like "b_.*"), not passing big lists of regexes. So we might be overcomplicating our lives to support a rare-to-nonexistent edge case...
So we might be overcomplicating our lives to support a rare-to-nonexistent edge case...
Yeah I think you’re right. Not worrying about it seems fine and we can always revisit it in the future if an unexpected use case arises.
I agree.
Jonah Gabry @.***> schrieb am Mi., 19. Mai 2021, 05:29:
So we might be overcomplicating our lives to support a rare-to-nonexistent edge case...
Yeah I think you’re right. Not worrying about it seems fine and we can always revisit it in the future if an unexpected use case arises.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/posterior/issues/131#issuecomment-843715977, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCW2ACZATM3PBHASBYAIQDTOMWATANCNFSM442XBW3Q .
I tested the single regex pattern with | yesterday but it turns out there is a size limit. Even with perl = TRUE, it starts throwing errors after certain number of variables.
Given the consensus that we're okay with letting it fail on large numbers of input regexes, @Ozan147 do you have the code you used for testing your implementation handy?