python-semrush icon indicating copy to clipboard operation
python-semrush copied to clipboard

Result parser issue when dealing with results containing a semicolon

Open hecklerponics opened this issue 5 years ago • 0 comments

There's an issue with parsing returned data with URLs that include a semicolon:

From /python_semrush/semrush.py

     84             result = {}
     85             for i, datum in enumerate(line.split(';')):
---> 86                 result[columns[i]] = datum.strip('"\n\r\t')
     87             results.append(result)

As an example this URL was passed from a call to the organic_phrase function: http://www.hilton.com/en/hotels/content/SPTSHHF/media/pdf/Tangerine_Bar_2.pdf;jsessionid=DTE5TAZBV525MCSGBI12VCQ

Resulting in list index out of range error. To get around this (just in case others find the same problem) I modified my script to declare export_escape=1 in the arguments to force double-quotes; I then updated the parser to split on '";"' instead of ";"

The new code looks like this: (lines 75-89 of /python_semrush/semrush.py)

    @staticmethod
    def parse_response(data):
        results = []
        data = data.decode('unicode_escape')
        lines = data.split('\r\n')
        lines = list(filter(bool, lines))
        columns = lines[0].split(';')

        for line in lines[1:]:
            result = {}
            for i, datum in enumerate(line.split('";"')):
                result[columns[i]] = datum.strip('"\n\r\t')
            results.append(result)

        return results

I'm sure there is a better way to do this, but in the meantime, this is a workaround that works!

hecklerponics avatar Sep 26 '19 22:09 hecklerponics