sitereview icon indicating copy to clipboard operation
sitereview copied to clipboard

xml.etree.ElementTree.ParseError: mismatched tag: line 10, column 10

Open algouye2802 opened this issue 5 years ago • 3 comments

Hi there,

Since yesterday, the script does not work no more and generate this error. It's look like that the json's structure might have changed:

Traceback (most recent call last):
  File "sitereview.py", line 61, in <module>
    main(args.url)
  File "sitereview.py", line 43, in main
    s.check_response(response)
  File "sitereview.py", line 28, in check_response
    root = ET.fromstring(self.req.content)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1311, in XML
    parser.feed(text)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1653, in feed
    self._raiseerror(v)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1517, in _raiseerror
    raise err
xml.etree.ElementTree.ParseError: mismatched tag: line 10, column 10

algouye2802 avatar Jul 10 '19 08:07 algouye2802

In order to prevent automation from using their website for categorization, it looks like they added a javascript redirect. Since most URL/URI modules and other tools like Postman will ignore the javascript, it never gets redirected. Using a browser, it follows the redirect fine, which allows it to reach the actual page and submit a website search for categorization.

weissrob avatar Jul 10 '19 22:07 weissrob

In order to prevent automation from using their website for categorization, it looks like they added a javascript redirect

Is it possible to bypass this redirect?

d78ui98 avatar Aug 21 '19 08:08 d78ui98

Not unless your automation can simulate a browser and actually process the JavaScript on the page to ignore the redirect and instead go to the correct page. I am not aware of any way to do this.

I am surprised that they don’t expose an API to avoid the extra load from automation.

If you have your own global manager, I think you can just query the API of that instead.

On Wed, Aug 21, 2019 at 3:22 AM Deepanshu Gajbhiye [email protected] wrote:

In order to prevent automation from using their website for categorization, it looks like they added a javascript redirect

Is it possible to bypass this redirect?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/PoorBillionaire/sitereview/issues/15?email_source=notifications&email_token=AAZGLMKDCKUV6AQW5E3MAZDQFT3NHA5CNFSM4H7MZN22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4Y3QSQ#issuecomment-523352138, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZGLMPV2SG4JG27VIWYGWTQFT3NHANCNFSM4H7MZN2Q .

weissrob avatar Aug 21 '19 21:08 weissrob