pyWhat icon indicating copy to clipboard operation
pyWhat copied to clipboard

Add support for list of regex in regex.json

Open ghost opened this issue 3 years ago • 4 comments

Is your feature request related to a problem? Please describe. Many regexes (like phone numbers or coordinates) have several formats. Currently, | can be used to separate formats, but it makes regex longer and harder to understand.

Describe the solution you'd like "Regex" should be a list of regexes.

ghost avatar Nov 03 '21 10:11 ghost

Hi, I would like to know about this issue in detail. Do you want multiple regex patterns separated by vertical pipe ( | ), or do you want any change in the code to support a list of regex patterns?

I can solve this issue by creating following blocks with different regex pattern for different type of phone Number Patterns.

      "Name": "Phone Number",
      "Regex": "^(\\s*(?:\\+?(\\d{1,3}))?[-. (]*(\\d{3})[-. )]*(\\d{3})[-. ]*(\\d{4})(?: *x(\\d+))?\\s*)$",
      "plural_name": false,
      "Description": null,
      "Rarity": 0.5,
      "URL": null,
      "Tags": [
         "Identifiers",
         "Credentials",
         "Phone Number",
         "Phone"
      ],
      "Children": {
         "path": "phone_codes.json",
         "entry": "Location(s): ",
         "method": "hashmap"
      },
      "Examples": {
         "Valid": [
            "202-555-0178",
            "+1-202-555-0156",
            "+662025550156",
            "+356 202 555 0156"
         ],
         "Invalid": []
      }
   }```

GauriBodke avatar Dec 03 '21 17:12 GauriBodke

I would like to have a list of regexp, so yes, changes in code are needed. Pywhat should concatenate all these regexes into one using pipes(|). Therefore, only regex loader function should be updated.

ghost avatar Dec 03 '21 17:12 ghost

Hey, I am able to combine regex patterns dynamically using |. What is your suggestion for combining other parameters of the block, such as Description, Rarity, URL, Tags, Children, and Examples.

GauriBodke avatar Dec 11 '21 19:12 GauriBodke

Hi! I will take our THM flag regex to better show what we want to achieve. (The only part changed is Regex)

We want to separate multiple regex formats of one identifier into a list of regexes, instead of having all of them together and separated by |. Some of regexes are quite long and seeing where is what is getting harder, so our goal is to improve their readability.

I am not sure how PyWhat should use them after tho. Merged them back into one with | or search with each regex separately. Probably the latter.

I hope this clears some things. Maybe @bee-san has some more suggestions. :)

   {
      "Name": "TryHackMe Flag Format",
      "Regex": ["(?i)^thm{.*}$", "(?i)^tryhackme{.*}$"],
      "plural_name": false,
      "Description": "Used for Capture The Flags at https://tryhackme.com",
      "Rarity": 1,
      "URL": null,
      "Tags": [
         "CTF Flag"
      ],
      "Examples": {
         "Valid": [
            "thm{hello}"
         ],
         "Invalid": []
      }
   }

amadejpapez avatar Dec 11 '21 21:12 amadejpapez