number-parser
number-parser copied to clipboard
`ignore` parameter
I think it could be really cool to add an optional parameter to ignore some words.
Example:
>>> parse('twenty one')
'21'
>>> parse('twenty one', ignore=["one"])
'20 one'
or
>>> parse('I have three apples and one pear.')
'I have 3 apples and 1 pear.'
>>> parse('I have three apples and one pear.', ignore=["three"])
'I have three apples and 1 pear.'
Unless we can think of a popular use case, or this is trivial to implement, it may be better to leave something like this for last.
@Gallaecio one use case is probably words like 'two second', 'a second' which @noviluni mentioned in #6 , where we don't want to parse second as 2nd. Of course, we should handle most of these ambiguous cases within the main logic, but as we expand to more languages this option might be useful.
Yeah, this idea came from this example: omitting "second". It could be also useful if we start accepting some sentences like "one and a quarter" (1.25) as "quarter" it's also a verb or a noun "the first quarter".
However, it's not necessary to implement this now. We should first accept multiple languages and then see how can we add specific rules for different languages, so we don't need to develop this now, it can be postponed. :+1:
Can we store the usual words like quarter, half separately and use them whenever required.
I’m not even sure if this is a good idea to implement this. My thinking is:
-
In cases where it’s not possible for
parse_number
to determine that a word should not be translated into a number, but the user knows, the most flexible approach for the user to prevent words being translated into numbers is to process the string before passing it toparse_number
, e.g. extracting what can be a number with a regular expression. -
In cases where
parse_number
should be able to determine that a word should not be translated as a number, ideally we should aim to handle that transparently without requiring the user to provide a specific list of words to ignore, and only other parameters that may play a role into deciding which words to ignore (e.g. languages, like dateparser supports).