preprocessor
preprocessor copied to clipboard
Edge case that takes too much time
Describe the bug
Running pp.clean('http://google.com/..........................') takes too much time. Seems like it's a bug.
To Reproduce
run pp.clean('http://google.com/..........................')
Expected behavior
It can return:
'..........................'''
Desktop (please complete the following information):
- OS: Linux
- Python Version: 3.8.5
- preprocessor version: 0.6.0
@s @kvtoraman The answer posted here could server as a workaround by skipping cases where the runtime is too long. For example, for the edge case
http://google.com/..........................
The following code will terminate after 2 seconds
import signal
import preprocessor as p
class TimeoutException(Exception):
pass
def timeout_handler(signum, frame):
raise TimeoutException
text_list = ["http://google.com/..........................", "hello world :+1: "]
signal.signal(signal.SIGALRM, timeout_handler)
for text in text_list:
signal.alarm(2)
try:
text = p.clean(text)
except TimeoutException:
print(f"Could not handle the {text}")
else:
signal.alarm(0)