python-humanfriendly icon indicating copy to clipboard operation
python-humanfriendly copied to clipboard

File sizes with thousand separators

Open LordGaav opened this issue 3 years ago • 2 comments

I'm trying to parse file sizes with thousands separators, but having no luck. With humanfriendly==9.0, I get the following:

$ python -i
Python 3.8.5 (default, Oct  6 2020, 07:21:17) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import humanfriendly
>>> humanfriendly.parse_size("1,067.6 KB")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "vendor/lib/python3.8/site-packages/humanfriendly/__init__.py", line 259, in parse_size
    raise InvalidSize(format(msg, size, tokens))
humanfriendly.InvalidSize: Failed to parse size! (input '1,067.6 KB' was tokenized as [1, ',', 67.6, 'KB'])

Can humanfriendly handle this? I can't seem to find a way to tell humanfriendly to expect a thousands separator (my data is fairly uniform, the separator is always the same).

LordGaav avatar Dec 03 '20 16:12 LordGaav

The most straightforward fix would be to just strip out the thousand separator. tokenize doesn't seem to handle locales anyways, and expects a float-like string with a unit:

diff --git a/humanfriendly/text.py b/humanfriendly/text.py
index a257a6a..de28a41 100644
--- a/humanfriendly/text.py
+++ b/humanfriendly/text.py
@@ -422,6 +422,8 @@ def tokenize(text):
     >>> tokenize('42.5 MB')
     [42.5, 'MB']
     """
+    # Strip out thousands separators
+    text = text.replace(",", "")
     tokenized_input = []
     for token in re.split(r'(\d+(?:\.\d+)?)', text):
         token = token.strip()

LordGaav avatar Dec 03 '20 16:12 LordGaav

can you maybe open a PR?

riaqn avatar Feb 14 '21 21:02 riaqn