number-parser icon indicating copy to clipboard operation
number-parser copied to clipboard

`ignore` parameter

Open noviluni opened this issue 4 years ago • 5 comments

I think it could be really cool to add an optional parameter to ignore some words.

Example:

>>> parse('twenty one')
'21'

>>> parse('twenty one', ignore=["one"])
'20 one'

or

>>> parse('I have three apples and one pear.')
'I have 3 apples and 1 pear.'

>>> parse('I have three apples and one pear.', ignore=["three"])
'I have three apples and 1 pear.'

noviluni avatar Jun 17 '20 19:06 noviluni

Unless we can think of a popular use case, or this is trivial to implement, it may be better to leave something like this for last.

Gallaecio avatar Jun 18 '20 09:06 Gallaecio

@Gallaecio one use case is probably words like 'two second', 'a second' which @noviluni mentioned in #6 , where we don't want to parse second as 2nd. Of course, we should handle most of these ambiguous cases within the main logic, but as we expand to more languages this option might be useful.

arnavkapoor avatar Jun 18 '20 09:06 arnavkapoor

Yeah, this idea came from this example: omitting "second". It could be also useful if we start accepting some sentences like "one and a quarter" (1.25) as "quarter" it's also a verb or a noun "the first quarter".

However, it's not necessary to implement this now. We should first accept multiple languages and then see how can we add specific rules for different languages, so we don't need to develop this now, it can be postponed. :+1:

noviluni avatar Jun 18 '20 09:06 noviluni

Can we store the usual words like quarter, half separately and use them whenever required.

Manish-210 avatar Feb 21 '21 03:02 Manish-210

I’m not even sure if this is a good idea to implement this. My thinking is:

  • In cases where it’s not possible for parse_number to determine that a word should not be translated into a number, but the user knows, the most flexible approach for the user to prevent words being translated into numbers is to process the string before passing it to parse_number, e.g. extracting what can be a number with a regular expression.

  • In cases where parse_number should be able to determine that a word should not be translated as a number, ideally we should aim to handle that transparently without requiring the user to provide a specific list of words to ignore, and only other parameters that may play a role into deciding which words to ignore (e.g. languages, like dateparser supports).

Gallaecio avatar Feb 21 '21 05:02 Gallaecio