python-semrush
python-semrush copied to clipboard
Result parser issue when dealing with results containing a semicolon
There's an issue with parsing returned data with URLs that include a semicolon:
From /python_semrush/semrush.py
84 result = {}
85 for i, datum in enumerate(line.split(';')):
---> 86 result[columns[i]] = datum.strip('"\n\r\t')
87 results.append(result)
As an example this URL was passed from a call to the organic_phrase function: http://www.hilton.com/en/hotels/content/SPTSHHF/media/pdf/Tangerine_Bar_2.pdf;jsessionid=DTE5TAZBV525MCSGBI12VCQ
Resulting in list index out of range error. To get around this (just in case others find the same problem) I modified my script to declare export_escape=1
in the arguments to force double-quotes; I then updated the parser to split on '";"'
instead of ";"
The new code looks like this: (lines 75-89 of /python_semrush/semrush.py)
@staticmethod
def parse_response(data):
results = []
data = data.decode('unicode_escape')
lines = data.split('\r\n')
lines = list(filter(bool, lines))
columns = lines[0].split(';')
for line in lines[1:]:
result = {}
for i, datum in enumerate(line.split('";"')):
result[columns[i]] = datum.strip('"\n\r\t')
results.append(result)
return results
I'm sure there is a better way to do this, but in the meantime, this is a workaround that works!