etip icon indicating copy to clipboard operation
etip copied to clipboard

standardize regexps for code signatures

Open eighthave opened this issue 5 years ago • 4 comments

Right now the code signature regexp are basically all words separated by . Since . matches anything, this could lead to false positives, e.g. the regexp com.here would match com.here and comehere. Code signature printing methods generally use either . or / as the separator, so using [/.] would be the most accurate. Otherwise if there are other potential chars used as separators, using \W ("non-word chars") would be better than .

eighthave avatar Jul 13 '20 20:07 eighthave

@pnu-s I think you mentioned you were going to do a mass edit on this directly in the database. Using [/.] would be best IMHO.

eighthave avatar Oct 02 '20 07:10 eighthave

@eighthave You're right, we discussed this but I did not take the time to do so yet.

I need to make sure this won't break anything by making a couple of tests but I agree with your proposal.

pnu-s avatar Oct 03 '20 09:10 pnu-s

yes please. I never know how thorough of a regex expression to use.


ads.advertising.com
sdk.advertising.com

^.*.advertising.com$

or just a string search of the advertising domain so it grabs all advertising.TLD? I guess I just leave it to you all to change and we just give you realworld examples in the box

jawz101 avatar Oct 28 '20 17:10 jawz101

I think the regex should be as specific as possible, since false positives seem more likely than false negatives in this case.

I think the ^ and $ are assumed, or put another way, it is fed into a Python re.match()

eighthave avatar Oct 29 '20 13:10 eighthave