yahoo-groups-backup icon indicating copy to clipboard operation
yahoo-groups-backup copied to clipboard

splinter.exceptions.ElementDoesNotExist: no elements could be found with tag_name "pre"

Open changeling opened this issue 7 years ago • 16 comments

I'm running into this issue. Any thoughts?

python3 yahoo-groups-backup.py scrape_messages --login=<my-login> --password=<my-password> <my-group-name>


Processing the log-in page...
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/splinter/element_list.py", line 40, in __getitem__
    return super(ElementList, self).__getitem__(index)
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "yahoo-groups-backup.py", line 129, in <module>
    main()
  File "yahoo-groups-backup.py", line 125, in main
    arguments, cfg_args)
  File "yahoo-groups-backup.py", line 103, in invoke_subcommand
    return module.command(args)
  File "/home/<my-user>/yahoo-groups-backup/yahoo_groups_backup/subcommands/scrape_messages.py", line 41, in command
    last_message = scraper.get_last_message_number()
  File "/home/<my-user>/yahoo-groups-backup/yahoo_groups_backup/scraper.py", line 84, in get_last_message_number
    return self._load_json_url(url)['ygData']['messages'][0]['messageId']
  File "/home/<my-user>/yahoo-groups-backup/yahoo_groups_backup/scraper.py", line 77, in _load_json_url
    return json.loads(self.br.find_by_tag("pre")[0].text)
  File "/usr/local/lib/python3.5/dist-packages/splinter/element_list.py", line 44, in __getitem__
    self.find_by, self.query))
splinter.exceptions.ElementDoesNotExist: no elements could be found with tag_name "pre"

changeling avatar May 12 '17 08:05 changeling

Hmm haven't seen this one. What does the browser window look like at that point in time? It's expecting the JSON data which the browser renders with a pre tag. It was a bit of a hack but it seemed to work.

csaftoiu avatar May 12 '17 16:05 csaftoiu

I'd like to leave the selenium window open on failure in order to check that. Where in the code might I do that?

Oh, also, I was getting some timeouts on username and password, so I upped the sleep() time to 5 on username:

    self.br.find_by_name("signin").click()
    # Wait ...
    time.sleep(5)

and 10 after passwd:

    self.br.find_by_name("signin").click()
    # Wait ...
    time.sleep(10)

to allow for page rendering on my (slow) machine. That eliminated these errors when either the machine or the network lags. (I made the sleep(10) change on the chance that the "pre" error was a timing problem. Doesn't seem to be the case.):

splinter.exceptions.ElementDoesNotExist: no elements could be found

with name "passwd" and splinter.exceptions.ElementDoesNotExist: no elements could be found with name "username"

On Fri, May 12, 2017 at 11:12 AM, Claudiu [email protected] wrote:

Hmm haven't seen this one. What does the browser window look like at that point in time? It's expecting the JSON data which the browser renders with a pre tag. It was a bit of a hack but it seemed to work.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/csaftoiu/yahoo-groups-backup/issues/41#issuecomment-301119722, or mute the thread https://github.com/notifications/unsubscribe-auth/AAYd_ljir8FXbx2mPmthzNl41pycpMF3ks5r5ITngaJpZM4NY-qh .

changeling avatar May 12 '17 17:05 changeling

UPDATE: It looks like Yahoo may have changed the code today. I'm now getting:

splinter.exceptions.ElementDoesNotExist: no elements could be found with name "passwd"

no matter what. I'll see if my sleep()s are somehow causing that.

On Fri, May 12, 2017 at 12:50 PM, Chris Larson [email protected] wrote:

I'd like to leave the selenium window open on failure in order to check that. Where in the code might I do that?

Oh, also, I was getting some timeouts on username and password, so I upped the sleep() time to 5 on username:

    self.br.find_by_name("signin").click()
    # Wait ...
    time.sleep(5)

and 10 after passwd:

    self.br.find_by_name("signin").click()
    # Wait ...
    time.sleep(10)

to allow for page rendering on my (slow) machine. That eliminated these errors when either the machine or the network lags. (I made the sleep(10) change on the chance that the "pre" error was a timing problem. Doesn't seem to be the case.):

splinter.exceptions.ElementDoesNotExist: no elements could be found

with name "passwd" and splinter.exceptions.ElementDoesNotExist: no elements could be found with name "username"

On Fri, May 12, 2017 at 11:12 AM, Claudiu [email protected] wrote:

Hmm haven't seen this one. What does the browser window look like at that point in time? It's expecting the JSON data which the browser renders with a pre tag. It was a bit of a hack but it seemed to work.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/csaftoiu/yahoo-groups-backup/issues/41#issuecomment-301119722, or mute the thread https://github.com/notifications/unsubscribe-auth/AAYd_ljir8FXbx2mPmthzNl41pycpMF3ks5r5ITngaJpZM4NY-qh .

changeling avatar May 12 '17 17:05 changeling

False alarm on that 'UPDATE'. For some reason, changing the sleep(2) to sleep(10) caused it to fail with the passwd error. Changing it back worked.

On Fri, May 12, 2017 at 12:52 PM, Chris Larson [email protected] wrote:

UPDATE: It looks like Yahoo may have changed the code today. I'm now getting:

splinter.exceptions.ElementDoesNotExist: no elements could be found with name "passwd"

no matter what. I'll see if my sleep()s are somehow causing that.

On Fri, May 12, 2017 at 12:50 PM, Chris Larson [email protected] wrote:

I'd like to leave the selenium window open on failure in order to check that. Where in the code might I do that?

Oh, also, I was getting some timeouts on username and password, so I upped the sleep() time to 5 on username:

    self.br.find_by_name("signin").click()
    # Wait ...
    time.sleep(5)

and 10 after passwd:

    self.br.find_by_name("signin").click()
    # Wait ...
    time.sleep(10)

to allow for page rendering on my (slow) machine. That eliminated these errors when either the machine or the network lags. (I made the sleep(10) change on the chance that the "pre" error was a timing problem. Doesn't seem to be the case.):

splinter.exceptions.ElementDoesNotExist: no elements could be found

with name "passwd" and splinter.exceptions.ElementDoesNotExist: no elements could be found with name "username"

On Fri, May 12, 2017 at 11:12 AM, Claudiu [email protected] wrote:

Hmm haven't seen this one. What does the browser window look like at that point in time? It's expecting the JSON data which the browser renders with a pre tag. It was a bit of a hack but it seemed to work.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/csaftoiu/yahoo-groups-backup/issues/41#issuecomment-301119722, or mute the thread https://github.com/notifications/unsubscribe-auth/AAYd_ljir8FXbx2mPmthzNl41pycpMF3ks5r5ITngaJpZM4NY-qh .

changeling avatar May 12 '17 18:05 changeling

Hmm try putting a sleep() or an input() right before the offending line:

  File "/home/<my-user>/yahoo-groups-backup/yahoo_groups_backup/scraper.py", line 77, in _load_json_url
    return json.loads(self.br.find_by_tag("pre")[0].text)

That should leave it open so you can check it out. If you could paste a screenshot here with the inspect console open that'd help. e.g. on Chrome I see this for a JSON document:

image

csaftoiu avatar May 12 '17 18:05 csaftoiu

Got it! Not sure how or where to set a firefox preference in a temporary profile, but here's the problem:

https://developer.mozilla.org/en-US/docs/Tools/JSON_viewer

The relevant config setting is:

devtools.jsonview.enabled

This needs to be set as false for the generated profile. I'm betting that solves the issue.

On Fri, May 12, 2017 at 1:05 PM, Chris Larson [email protected] wrote:

False alarm on that 'UPDATE'. For some reason, changing the sleep(2) to sleep(10) caused it to fail with the passwd error. Changing it back worked.

On Fri, May 12, 2017 at 12:52 PM, Chris Larson [email protected] wrote:

UPDATE: It looks like Yahoo may have changed the code today. I'm now getting:

splinter.exceptions.ElementDoesNotExist: no elements could be found with name "passwd"

no matter what. I'll see if my sleep()s are somehow causing that.

On Fri, May 12, 2017 at 12:50 PM, Chris Larson [email protected] wrote:

I'd like to leave the selenium window open on failure in order to check that. Where in the code might I do that?

Oh, also, I was getting some timeouts on username and password, so I upped the sleep() time to 5 on username:

    self.br.find_by_name("signin").click()
    # Wait ...
    time.sleep(5)

and 10 after passwd:

    self.br.find_by_name("signin").click()
    # Wait ...
    time.sleep(10)

to allow for page rendering on my (slow) machine. That eliminated these errors when either the machine or the network lags. (I made the sleep(10) change on the chance that the "pre" error was a timing problem. Doesn't seem to be the case.):

splinter.exceptions.ElementDoesNotExist: no elements could be found

with name "passwd" and splinter.exceptions.ElementDoesNotExist: no elements could be found with name "username"

On Fri, May 12, 2017 at 11:12 AM, Claudiu [email protected] wrote:

Hmm haven't seen this one. What does the browser window look like at that point in time? It's expecting the JSON data which the browser renders with a pre tag. It was a bit of a hack but it seemed to work.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/csaftoiu/yahoo-groups-backup/issues/41#issuecomment-301119722, or mute the thread https://github.com/notifications/unsubscribe-auth/AAYd_ljir8FXbx2mPmthzNl41pycpMF3ks5r5ITngaJpZM4NY-qh .

changeling avatar May 12 '17 18:05 changeling

Note that, according to that link, the JSON View will be enabled in Firefox starting with v53, so I suspect you'll be hearing more of this. :)

On Fri, May 12, 2017 at 1:25 PM, changeling [email protected] wrote:

Got it! Not sure how or where to set a firefox preference in a temporary profile, but here's the problem:

https://developer.mozilla.org/en-US/docs/Tools/JSON_viewer

The relevant config setting is:

devtools.jsonview.enabled

This needs to be set as false for the generated profile. I'm betting that solves the issue.

On Fri, May 12, 2017 at 1:05 PM, Chris Larson [email protected] wrote:

False alarm on that 'UPDATE'. For some reason, changing the sleep(2) to sleep(10) caused it to fail with the passwd error. Changing it back worked.

On Fri, May 12, 2017 at 12:52 PM, Chris Larson [email protected] wrote:

UPDATE: It looks like Yahoo may have changed the code today. I'm now getting:

splinter.exceptions.ElementDoesNotExist: no elements could be found with name "passwd"

no matter what. I'll see if my sleep()s are somehow causing that.

On Fri, May 12, 2017 at 12:50 PM, Chris Larson [email protected] wrote:

I'd like to leave the selenium window open on failure in order to check that. Where in the code might I do that?

Oh, also, I was getting some timeouts on username and password, so I upped the sleep() time to 5 on username:

self.br.find_by_name("signin").click()

Wait ...

time.sleep(5)

and 10 after passwd:

self.br.find_by_name("signin").click()

Wait ...

time.sleep(10)

to allow for page rendering on my (slow) machine. That eliminated these errors when either the machine or the network lags. (I made the sleep(10) change on the chance that the "pre" error was a timing problem. Doesn't seem to be the case.):

splinter.exceptions.ElementDoesNotExist: no elements could be found with name "passwd" and splinter.exceptions.ElementDoesNotExist: no elements could be found with name "username"

On Fri, May 12, 2017 at 11:12 AM, Claudiu [email protected] wrote:

Hmm haven't seen this one. What does the browser window look like at that point in time? It's expecting the JSON data which the browser renders with a pre tag. It was a bit of a hack but it seemed to work.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/csaftoiu/yahoo-groups-backup/issues/41# issuecomment-301119722>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAYd_ ljir8FXbx2mPmthzNl41pycpMF3ks5r5ITngaJpZM4NY-qh> .

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/csaftoiu/yahoo-groups-backup/issues/41#issuecomment-301151188, or mute the thread https://github.com/notifications/unsubscribe-auth/AAYd_kpLdbuvMpFpIbr8lckUXzgnIUrjks5r5KQmgaJpZM4NY-qh .

changeling avatar May 12 '17 18:05 changeling

Ooh, that's unfortunate. That's the thing about hacks, they only work for so long . . . this will indeed have to be fixed in the code. I probably won't get to it any time soon unfortunately. Thanks for reporting the issue and identifying the cause though, that will make a fix possible :).

csaftoiu avatar May 12 '17 19:05 csaftoiu

Don't really have time right now to dig in, but if I get a chance, I'll let you know.

It looks like you may be able to set the preference via:

http://splinter.readthedocs.io/en/latest/drivers/firefox.html#how-to-use-selenium-capabilities-for-firefox

using:

https://seleniumhq.github.io/selenium/docs/api/py/webdriver_firefox/selenium.webdriver.firefox.options.html#module-selenium.webdriver.firefox.options

Using this to set the 'devtools.jsonview.enabled' preference to false would likely keep your hack working fine.

Also, I saw this little snippet, and change it to the desired preference setting, if that helps at all:

import os

from selenium import webdriver

fp = webdriver.FirefoxProfile() fp.set_preference("devtools.jsonview.enabled",False)

browser = webdriver.Firefox(firefox_profile=fp)

Thanks for what looks like an amazing script! I've been struggling with Yahoo Groups for awhile now. As I said, if I have more time, I'll try to dig in.

Chris

On Fri, May 12, 2017 at 2:06 PM, Claudiu [email protected] wrote:

Ooh, that's unfortunate. That's the thing about hacks, they only work for so long . . . this will indeed have to be fixed in the code. I probably won't get to it any time soon unfortunately. Thanks for reporting the issue and identifying the cause though, that will make a fix possible :).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/csaftoiu/yahoo-groups-backup/issues/41#issuecomment-301160665, or mute the thread https://github.com/notifications/unsubscribe-auth/AAYd_pH0bggHVP6jKIRs_2_kmpK4wtaeks5r5K2hgaJpZM4NY-qh .

changeling avatar May 12 '17 19:05 changeling

I get the same error and spent a bit of time trying to fix but to no avail.

Inspired by this project a while back I started on a Ruby version which I have just finished given I can't get this one working. If anyone is interested see https://github.com/jonbartlett/yahoo-groups-export

jonbartlett avatar Nov 07 '17 01:11 jonbartlett

@jonbartlett Are you looking at adding photo export, too?

changeling avatar Nov 08 '17 20:11 changeling

@changeling Photos attached to posts? If so, possibly but there are so few in the forum I am migrating it probably isn't a priority.

If you want to get involved find a post with a photo and see how it is represented through the API:

https://groups.yahoo.com/api/v1/groups//messages/4/raw

Also better if we move this conversation over to my repo.

jonbartlett avatar Nov 08 '17 22:11 jonbartlett

i encountered the error described in the OP while passing --driver=chrome to the script to get around the issue described in this thread https://github.com/csaftoiu/yahoo-groups-backup/issues/47#issuecomment-417490160 the script ran for a bit but then produced the "splinter.exceptions.ElementDoesNotExist: no elements could be found with tag_name "pre" message

JustinCEO avatar Oct 19 '19 20:10 JustinCEO

@JustinCEO If you're still seeing this on my fork, can you open an issue there with your stack traces?

hrenfroe avatar Oct 20 '19 00:10 hrenfroe

@hrenfroe I did run into the issue. The problem was a Yahoo Splash screen which prevented the login process to initiate. I just increased the time.sleep() durations to 5 seconds in the yahoo-groups-backup/scrapper.py function _process_login_page, which gives me the time to click "ok" on the spash screen, then the login goes on smoothly.

peterhost avatar Oct 21 '19 18:10 peterhost

Also, for the record if anybody stumbles upon this (as we're all in a hurry to backup our groups). One of the groups I had to backup is big, more than 50k posts. Nodejs runs in a memory heap problem when stringifying the jsonp loaded in memory before splitting it into the data.messagedata-xxx-xxx.js files. The quick fix is to add an argument to dump_site.py in the subcommands dir to increase the memory for the nodejs's V8 thread : --max_old_space_size=4096 sufficed for me.

 def render_search_indices(self):
     subprocess.Popen([
         "node", "--max_old_space_size=4096", P.join(P.dirname(P.realpath(__file__)), 'generate_search_index.js'),
         P.join(self.data_dir)
     ]).communicate()

peterhost avatar Oct 23 '19 21:10 peterhost