pregex icon indicating copy to clipboard operation
pregex copied to clipboard

Create classes Email and Date in pregex.meta.essentials

Open manoss96 opened this issue 2 years ago • 2 comments

manoss96 avatar Aug 23 '22 17:08 manoss96

Hi, are you already working on it? I can try adding an Email and Date classes if you want.

dylannalex avatar Aug 24 '22 18:08 dylannalex

@dylannalex I've actually finished the Email one, though Date is yet to be made. You can have a go at it, it first needs some thought on the design though. I'm thinking of it having a single string parameter "format", through which you define the format of the date that you're willing to match, e.g. "mm/dd/yyyy". You can find more formats here. The other thing i'm thinking is that we could have Date(*formats), so you can match many formats with one instance. So, instead of one having to do:

from pregex import *

date1 = Date("mm/dd/yyyy")
date2 = Date("dd/mm/yyyy")

dates = op.Either(date1, date2)

they can just do:

from pregex import *

dates = Date("mm/dd/yyyy", "dd/mm/yyyy")

In case no formats are provided, then the deafult will be to match any date format. What do you think?

I'm gonna create a branch called v2.0.1, as well as another branch based on this issue. You can work on it there.

manoss96 avatar Aug 25 '22 04:08 manoss96

Hi @manoss96, I've been working on the Date class. Here's what I came up with:

Date formats

I added Date.date_formats, a Date class attribute that contains all valid date formats:

from pregex import *

Date.date_formats
>>> ('mm/dd/yyyy', 'dd/mm/yyyy', 'yyyy/mm/dd')

By default, Date matches any date format in Date.date_formats.

Date arguments

I followed your suggestion and let the user match many formats with a single Date instance.

from pregex import *

text ="""
01/11/2001
12/09/1996
1875/11/02
"""

pre1 = Date()
pre1.get_matches(text)
>>> ['01/11/2001', '12/09/1996', '1875/11/02']

pre2 = Date("dd/mm/yyyy")
pre2.get_matches(text)
>>> ['01/11/2001', '12/09/1996']

pre3 = Date("dd/mm/yyyy", "yyyy/mm/dd")
pre3.get_matches(text)
>>> ['01/11/2001', '12/09/1996', '1875/11/02']

Note: Date converts all uppercase characters in a date format into lowercase characters (e.g. "DD/MM/YYYY" is converted to "dd/mm/yyyy")

Invalid formats

The given formats are compared to date formats on Date.date_formats. When an invalid format is found, Date raises InvalidArgumentValueException.

pre = Date("dd/mm/yyy")
>>> pregex.core.exceptions.InvalidArgumentValueException: Provided date format "dd/mm/yyy" is not valid.

Let me know your thoughts. I'm up to adding more features or improving any aspect you consider!

dylannalex avatar Aug 25 '22 15:08 dylannalex

I think this should also consider short-hand notations for years such as 02 for 2002. It might also make sense to add notations for time as well. Something like the strptime function in the datetime module.

For example, you could have D/M/y to match things like 01/03/02, but D/M/Y to match stuff like 01/03/2002.

I feel like this format makes sense because it's already synonymous with other Python libraries and won't be a hassle for users to learn.

alansun17904 avatar Aug 25 '22 16:08 alansun17904

@dylannalex Looks great, good job! As for the formats, I suggest that we follow this notation. That way we can have all lowercase while at the same time we can differentiate between 2002 and 02 like @alansun17904 said. For now, I'd say that implementing any valid combination of "d/dd", "m/mm", and "yy/yyyy" along with separators "/" and "-", is good enough. In the future, more formats might follow.

To wrap up, I suggest the following list of formats:

  1. d/m/yy
  2. dd/m/yy
  3. d/mm/yy
  4. dd/mm/yy
  5. d/m/yyyy
  6. dd/m/yyyy
  7. d/mm/yyyy
  8. dd/mm/yyyy
  9. m/d/yy
  10. mm/d/yy
  11. m/dd/yy
  12. mm/dd/yy
  13. m/d/yyyy
  14. mm/d/yyyy
  15. m/dd/yyyy
  16. mm/dd/yyyy
  17. yy/m/d
  18. yyyy/m/d
  19. yy/mm/d
  20. yyyy/mm/d
  21. yy/m/dd
  22. yyyy/m/dd
  23. yy/mm/dd
  24. yyyy/mm/dd

Plus all of the above using the "-" separator, suming to a total of 24 + 24 = 48 different formats.

I don't know about your current implementation, but I suggest having a dictionary of 6 different keys, namely "d", "dd", "m", "mm", "yy" and "yyyy", each mapping to a different pre-defined "Pregex" instance for matching each possible part of the date. Then it's just a matter of combining these instances together, separated by either "/" or "-". How's that sound?

manoss96 avatar Aug 25 '22 17:08 manoss96

Sounds great, @manoss96! Thank you and @alansun17904 for the help!

About what @alansun17904 said, I'd avoid date time values for now, since I consider it would be better to have a Date class for matching only dates and a Time class for matching time values. Once we have these two classes working, implementing a DateTime class should be as easy as merging Date and Time.

dylannalex avatar Aug 25 '22 17:08 dylannalex

About what @alansun17904 said, I'd avoid date time values for now, since I consider it would be better to have a Date class for matching only dates and a Time class for matching time values. Once we have these two classes working, implementing a DateTime class should be as easy as merging Date and Time.

Yeah I agree with @dylannalex . As for the implementation that we discussed, feel free to use other classes from pregex.meta as they might help you. For example, you can use Integer(1, 12) for "m".

manoss96 avatar Aug 25 '22 17:08 manoss96

About default formats, it is impractical to hardcode all the 48 different combinations. What about adding an static method Date.date_formats() to compute all different format combinations. I think the itertools.permutations from the standard library would be a great tool for this task. Let me know if I can import this function!

Oh, and one last thing:

you can use Integer(1, 12) for "m"

Do you mean Integer(1, 10)?

dylannalex avatar Aug 25 '22 18:08 dylannalex

About default formats, it is impractical to hardcode all the 48 different combinations. What about adding an static method Date.date_formats() to compute all different format combinations. I think the itertools.permutations from the standard library would be a great tool for this task. Let me know if I can import this function!

Sure, you can use it. Just make sure that you import it with a different name starting with a "_" so it isn't directly imported every time pregex.meta is imported. Better yet, import it within the "Date" class itself.

Oh, and one last thing:

you can use Integer(1, 12) for "m"

Do you mean Integer(1, 10)?

Yeah I'm sorry you're right. I was under the impression that "m" matched "11" and "12" too, and that it only indicated that a single-digit month must not have a leading zero, e.g. "3" would be okay but "03" would not. In that case, I'm guessing using "Integer" would be an overkill so you can go with something simpler. However, if you find that some class in pregex.meta could help you, don't hesitate using it!

manoss96 avatar Aug 25 '22 18:08 manoss96

I've finished the Date class implementation. I've implemented each 48 different combinations dynamically, so adding new date formats should be straightforward.

Features:

  • If no format is provided, Date considers all possible formats.
  • All format provided are converted to all lower case (e.g. dD/mM/yyYY is converted to dd/mm/yyyy).
  • Raises InvalidArgumentValueException when an invalid format is provided.

I also didn't use itertools.permutations, so no extra import needed!

I'm now working on documentation, which it's not my strong point. I'd really appreciate some help with it 😄
In a nutshell, the Date class has the following structure:

class Date(_pre.Pregex):
    '''
    Matches any date.

    :param str \*formats: Strings that determines which date formats to be considered a match.
        A date can either be dd/mm/yy, mm/dd/yy or yy/mm/dd (separated by by '/' or '-'), where:
            yy – two-digit year, e.g. 21
            yyyy – four-digit year, e.g. 2021
            m – one-digit month for months below 10, e.g. 3
            mm – two-digit month, e.g. 03\
            d – one-digit day of the month for days below 10, e.g. 2
            dd – two-digit day of the month, e.g. 02
        By default, all date formats are considered.
    
    :raises InvalidArgumentValueException: Invalid date format provided.
    '''
    __date_separators: tuple[str, str] = ("-", "/")
    __date_value_pre: dict[str, _pre.Pregex] = {
        "d":_cl.AnyDigit() - "0",
        "dd":_op.Either("0" + _cl.AnyDigit(), PositiveInteger(10, 31)),
        "m":_cl.AnyDigit() - "0",
        "mm":_op.Either("0" + _cl.AnyDigit(), PositiveInteger(10, 12)),
        "yy":_cl.AnyDigit() * 2,
        "yyyy":_cl.AnyDigit() * 4,
    }

    def __init__(self, *formats: str):
        '''
        Matches any date.

        :param str \*formats: Strings that determines which date formats to be considered a match. \
            A date can either be dd/mm/yy, mm/dd/yy or yy/mm/dd (separated by by '/' or '-'), where:
                yy – two-digit year, e.g. 21
                yyyy – four-digit year, e.g. 2021
                m – one-digit month for months below 10, e.g. 3
                mm – two-digit month, e.g. 03
                d – one-digit day of the month for days below 10, e.g. 2
                dd – two-digit day of the month, e.g. 02
            By default, all date formats are considered.
        
        :raises InvalidArgumentValueException: Invalid date format provided.
        '''

    def __date_pre(format: str) -> _pre.Pregex:
        """
        Converts a date format into a ``Pregex`` instance.
        
        :param str format: The date format to be converted.
        """

    def __date_formats() -> list[str]:
        '''
        Returns a list containing all possible date format combinations.
        '''

dylannalex avatar Aug 25 '22 20:08 dylannalex

Looks good! Don't worry about documentation, I can do this later. A few points:

  • Make sure you do (cl.AnyDigit() - "0") in "mm" and "dd" so a match with "00" isn't possible.
  • Replace "PositiveInteger" with "Integer" as the former will try to match the sign "+" too.
  • Add some tests in "tests/test_meta_essentials.py" if it's easy for you. Nothing crazy, just trying to match some valid/invalid dates. You can copy the testing structure of classes like HttpUrl, IPv4 and IPv6.

After doing these I think that you're good to go, so open a PR whenever you're ready.

manoss96 avatar Aug 26 '22 05:08 manoss96

I think this seems great! I can help out with documentation as well if need be.

alansun17904 avatar Aug 26 '22 15:08 alansun17904

Thanks for your help, @manoss96!

PR is open. I've added tests and fixed what we discussed. I also ensured date values (i.e. 'd', 'dd', 'm', 'mm', 'yy', 'yyyy') are not enclosed by any other digit:

__date_value_pre: dict[str, _pre.Pregex] = {
        "d":_asr.NotEnclosedBy(_cl.AnyDigit() - "0", _cl.AnyDigit()),
        "dd":_asr.NotEnclosedBy(
            _op.Either("0" + (_cl.AnyDigit() - "0"), Integer(10, 31)),
            _cl.AnyDigit()),
        "m":_asr.NotEnclosedBy(_cl.AnyDigit() - "0", _cl.AnyDigit()),
        "mm":_asr.NotEnclosedBy(
            _op.Either("0" + (_cl.AnyDigit() - "0"), Integer(10, 12)),
            _cl.AnyDigit()),
        "yy":_asr.NotEnclosedBy(_cl.AnyDigit() * 2, _cl.AnyDigit()),
        "yyyy":_asr.NotEnclosedBy(_cl.AnyDigit() * 4, _cl.AnyDigit()),
    }

Greetings.

dylannalex avatar Aug 26 '22 15:08 dylannalex

@dylannalex Played around with the class and it looks great. Good job. Since both "Email" and "Date" are done, I'm closing this issue.

manoss96 avatar Aug 26 '22 17:08 manoss96