preprocessor icon indicating copy to clipboard operation
preprocessor copied to clipboard

Edge case that takes too much time

Open kvtoraman opened this issue 4 years ago • 1 comments

Describe the bug Running pp.clean('http://google.com/..........................') takes too much time. Seems like it's a bug.

To Reproduce

run pp.clean('http://google.com/..........................')

Expected behavior

It can return:

  • '..........................'
  • ''

Desktop (please complete the following information):

  • OS: Linux
  • Python Version: 3.8.5
  • preprocessor version: 0.6.0

kvtoraman avatar Apr 12 '21 07:04 kvtoraman

@s @kvtoraman The answer posted here could server as a workaround by skipping cases where the runtime is too long. For example, for the edge case

http://google.com/..........................

The following code will terminate after 2 seconds

import signal
import preprocessor as p

class TimeoutException(Exception):
    pass

def timeout_handler(signum, frame):
    raise TimeoutException

text_list = ["http://google.com/..........................", "hello world :+1: "]

signal.signal(signal.SIGALRM, timeout_handler)
for text in text_list:
    signal.alarm(2)
    try:
        text = p.clean(text)
    except TimeoutException:
        print(f"Could not handle the {text}")
    else:
        signal.alarm(0)

guanqun-yang avatar Aug 20 '21 14:08 guanqun-yang