python-humanfriendly
python-humanfriendly copied to clipboard
File sizes with thousand separators
I'm trying to parse file sizes with thousands separators, but having no luck. With humanfriendly==9.0, I get the following:
$ python -i
Python 3.8.5 (default, Oct 6 2020, 07:21:17)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import humanfriendly
>>> humanfriendly.parse_size("1,067.6 KB")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "vendor/lib/python3.8/site-packages/humanfriendly/__init__.py", line 259, in parse_size
raise InvalidSize(format(msg, size, tokens))
humanfriendly.InvalidSize: Failed to parse size! (input '1,067.6 KB' was tokenized as [1, ',', 67.6, 'KB'])
Can humanfriendly handle this? I can't seem to find a way to tell humanfriendly to expect a thousands separator (my data is fairly uniform, the separator is always the same).
The most straightforward fix would be to just strip out the thousand separator. tokenize doesn't seem to handle locales anyways, and expects a float-like string with a unit:
diff --git a/humanfriendly/text.py b/humanfriendly/text.py
index a257a6a..de28a41 100644
--- a/humanfriendly/text.py
+++ b/humanfriendly/text.py
@@ -422,6 +422,8 @@ def tokenize(text):
>>> tokenize('42.5 MB')
[42.5, 'MB']
"""
+ # Strip out thousands separators
+ text = text.replace(",", "")
tokenized_input = []
for token in re.split(r'(\d+(?:\.\d+)?)', text):
token = token.strip()
can you maybe open a PR?