yahoo-groups-backup
yahoo-groups-backup copied to clipboard
splinter.exceptions.ElementDoesNotExist: no elements could be found with tag_name "pre"
I'm running into this issue. Any thoughts?
python3 yahoo-groups-backup.py scrape_messages --login=<my-login> --password=<my-password> <my-group-name>
Processing the log-in page...
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/splinter/element_list.py", line 40, in __getitem__
return super(ElementList, self).__getitem__(index)
IndexError: list index out of range
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "yahoo-groups-backup.py", line 129, in <module>
main()
File "yahoo-groups-backup.py", line 125, in main
arguments, cfg_args)
File "yahoo-groups-backup.py", line 103, in invoke_subcommand
return module.command(args)
File "/home/<my-user>/yahoo-groups-backup/yahoo_groups_backup/subcommands/scrape_messages.py", line 41, in command
last_message = scraper.get_last_message_number()
File "/home/<my-user>/yahoo-groups-backup/yahoo_groups_backup/scraper.py", line 84, in get_last_message_number
return self._load_json_url(url)['ygData']['messages'][0]['messageId']
File "/home/<my-user>/yahoo-groups-backup/yahoo_groups_backup/scraper.py", line 77, in _load_json_url
return json.loads(self.br.find_by_tag("pre")[0].text)
File "/usr/local/lib/python3.5/dist-packages/splinter/element_list.py", line 44, in __getitem__
self.find_by, self.query))
splinter.exceptions.ElementDoesNotExist: no elements could be found with tag_name "pre"
Hmm haven't seen this one. What does the browser window look like at that point in time? It's expecting the JSON data which the browser renders with a pre tag. It was a bit of a hack but it seemed to work.
I'd like to leave the selenium window open on failure in order to check that. Where in the code might I do that?
Oh, also, I was getting some timeouts on username and password, so I upped the sleep() time to 5 on username:
self.br.find_by_name("signin").click()
# Wait ...
time.sleep(5)
and 10 after passwd:
self.br.find_by_name("signin").click()
# Wait ...
time.sleep(10)
to allow for page rendering on my (slow) machine. That eliminated these errors when either the machine or the network lags. (I made the sleep(10) change on the chance that the "pre" error was a timing problem. Doesn't seem to be the case.):
splinter.exceptions.ElementDoesNotExist: no elements could be found
with name "passwd" and splinter.exceptions.ElementDoesNotExist: no elements could be found with name "username"
On Fri, May 12, 2017 at 11:12 AM, Claudiu [email protected] wrote:
Hmm haven't seen this one. What does the browser window look like at that point in time? It's expecting the JSON data which the browser renders with a pre tag. It was a bit of a hack but it seemed to work.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/csaftoiu/yahoo-groups-backup/issues/41#issuecomment-301119722, or mute the thread https://github.com/notifications/unsubscribe-auth/AAYd_ljir8FXbx2mPmthzNl41pycpMF3ks5r5ITngaJpZM4NY-qh .
UPDATE: It looks like Yahoo may have changed the code today. I'm now getting:
splinter.exceptions.ElementDoesNotExist: no elements could be found with name "passwd"
no matter what. I'll see if my sleep()s are somehow causing that.
On Fri, May 12, 2017 at 12:50 PM, Chris Larson [email protected] wrote:
I'd like to leave the selenium window open on failure in order to check that. Where in the code might I do that?
Oh, also, I was getting some timeouts on username and password, so I upped the sleep() time to 5 on username:
self.br.find_by_name("signin").click() # Wait ... time.sleep(5)
and 10 after passwd:
self.br.find_by_name("signin").click() # Wait ... time.sleep(10)
to allow for page rendering on my (slow) machine. That eliminated these errors when either the machine or the network lags. (I made the sleep(10) change on the chance that the "pre" error was a timing problem. Doesn't seem to be the case.):
splinter.exceptions.ElementDoesNotExist: no elements could be found
with name "passwd" and splinter.exceptions.ElementDoesNotExist: no elements could be found with name "username"
On Fri, May 12, 2017 at 11:12 AM, Claudiu [email protected] wrote:
Hmm haven't seen this one. What does the browser window look like at that point in time? It's expecting the JSON data which the browser renders with a pre tag. It was a bit of a hack but it seemed to work.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/csaftoiu/yahoo-groups-backup/issues/41#issuecomment-301119722, or mute the thread https://github.com/notifications/unsubscribe-auth/AAYd_ljir8FXbx2mPmthzNl41pycpMF3ks5r5ITngaJpZM4NY-qh .
False alarm on that 'UPDATE'. For some reason, changing the sleep(2) to sleep(10) caused it to fail with the passwd error. Changing it back worked.
On Fri, May 12, 2017 at 12:52 PM, Chris Larson [email protected] wrote:
UPDATE: It looks like Yahoo may have changed the code today. I'm now getting:
splinter.exceptions.ElementDoesNotExist: no elements could be found with name "passwd"
no matter what. I'll see if my sleep()s are somehow causing that.
On Fri, May 12, 2017 at 12:50 PM, Chris Larson [email protected] wrote:
I'd like to leave the selenium window open on failure in order to check that. Where in the code might I do that?
Oh, also, I was getting some timeouts on username and password, so I upped the sleep() time to 5 on username:
self.br.find_by_name("signin").click() # Wait ... time.sleep(5)
and 10 after passwd:
self.br.find_by_name("signin").click() # Wait ... time.sleep(10)
to allow for page rendering on my (slow) machine. That eliminated these errors when either the machine or the network lags. (I made the sleep(10) change on the chance that the "pre" error was a timing problem. Doesn't seem to be the case.):
splinter.exceptions.ElementDoesNotExist: no elements could be found
with name "passwd" and splinter.exceptions.ElementDoesNotExist: no elements could be found with name "username"
On Fri, May 12, 2017 at 11:12 AM, Claudiu [email protected] wrote:
Hmm haven't seen this one. What does the browser window look like at that point in time? It's expecting the JSON data which the browser renders with a pre tag. It was a bit of a hack but it seemed to work.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/csaftoiu/yahoo-groups-backup/issues/41#issuecomment-301119722, or mute the thread https://github.com/notifications/unsubscribe-auth/AAYd_ljir8FXbx2mPmthzNl41pycpMF3ks5r5ITngaJpZM4NY-qh .
Hmm try putting a sleep() or an input() right before the offending line:
File "/home/<my-user>/yahoo-groups-backup/yahoo_groups_backup/scraper.py", line 77, in _load_json_url
return json.loads(self.br.find_by_tag("pre")[0].text)
That should leave it open so you can check it out. If you could paste a screenshot here with the inspect console open that'd help. e.g. on Chrome I see this for a JSON document:
Got it! Not sure how or where to set a firefox preference in a temporary profile, but here's the problem:
https://developer.mozilla.org/en-US/docs/Tools/JSON_viewer
The relevant config setting is:
devtools.jsonview.enabled
This needs to be set as false for the generated profile. I'm betting that solves the issue.
On Fri, May 12, 2017 at 1:05 PM, Chris Larson [email protected] wrote:
False alarm on that 'UPDATE'. For some reason, changing the sleep(2) to sleep(10) caused it to fail with the passwd error. Changing it back worked.
On Fri, May 12, 2017 at 12:52 PM, Chris Larson [email protected] wrote:
UPDATE: It looks like Yahoo may have changed the code today. I'm now getting:
splinter.exceptions.ElementDoesNotExist: no elements could be found with name "passwd"
no matter what. I'll see if my sleep()s are somehow causing that.
On Fri, May 12, 2017 at 12:50 PM, Chris Larson [email protected] wrote:
I'd like to leave the selenium window open on failure in order to check that. Where in the code might I do that?
Oh, also, I was getting some timeouts on username and password, so I upped the sleep() time to 5 on username:
self.br.find_by_name("signin").click() # Wait ... time.sleep(5)
and 10 after passwd:
self.br.find_by_name("signin").click() # Wait ... time.sleep(10)
to allow for page rendering on my (slow) machine. That eliminated these errors when either the machine or the network lags. (I made the sleep(10) change on the chance that the "pre" error was a timing problem. Doesn't seem to be the case.):
splinter.exceptions.ElementDoesNotExist: no elements could be found
with name "passwd" and splinter.exceptions.ElementDoesNotExist: no elements could be found with name "username"
On Fri, May 12, 2017 at 11:12 AM, Claudiu [email protected] wrote:
Hmm haven't seen this one. What does the browser window look like at that point in time? It's expecting the JSON data which the browser renders with a pre tag. It was a bit of a hack but it seemed to work.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/csaftoiu/yahoo-groups-backup/issues/41#issuecomment-301119722, or mute the thread https://github.com/notifications/unsubscribe-auth/AAYd_ljir8FXbx2mPmthzNl41pycpMF3ks5r5ITngaJpZM4NY-qh .
Note that, according to that link, the JSON View will be enabled in Firefox starting with v53, so I suspect you'll be hearing more of this. :)
On Fri, May 12, 2017 at 1:25 PM, changeling [email protected] wrote:
Got it! Not sure how or where to set a firefox preference in a temporary profile, but here's the problem:
https://developer.mozilla.org/en-US/docs/Tools/JSON_viewer
The relevant config setting is:
devtools.jsonview.enabled
This needs to be set as false for the generated profile. I'm betting that solves the issue.
On Fri, May 12, 2017 at 1:05 PM, Chris Larson [email protected] wrote:
False alarm on that 'UPDATE'. For some reason, changing the sleep(2) to sleep(10) caused it to fail with the passwd error. Changing it back worked.
On Fri, May 12, 2017 at 12:52 PM, Chris Larson [email protected] wrote:
UPDATE: It looks like Yahoo may have changed the code today. I'm now getting:
splinter.exceptions.ElementDoesNotExist: no elements could be found with name "passwd"
no matter what. I'll see if my sleep()s are somehow causing that.
On Fri, May 12, 2017 at 12:50 PM, Chris Larson [email protected] wrote:
I'd like to leave the selenium window open on failure in order to check that. Where in the code might I do that?
Oh, also, I was getting some timeouts on username and password, so I upped the sleep() time to 5 on username:
self.br.find_by_name("signin").click()
Wait ...
time.sleep(5)
and 10 after passwd:
self.br.find_by_name("signin").click()
Wait ...
time.sleep(10)
to allow for page rendering on my (slow) machine. That eliminated these errors when either the machine or the network lags. (I made the sleep(10) change on the chance that the "pre" error was a timing problem. Doesn't seem to be the case.):
splinter.exceptions.ElementDoesNotExist: no elements could be found with name "passwd" and splinter.exceptions.ElementDoesNotExist: no elements could be found with name "username"
On Fri, May 12, 2017 at 11:12 AM, Claudiu [email protected] wrote:
Hmm haven't seen this one. What does the browser window look like at that point in time? It's expecting the JSON data which the browser renders with a pre tag. It was a bit of a hack but it seemed to work.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/csaftoiu/yahoo-groups-backup/issues/41# issuecomment-301119722>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAYd_ ljir8FXbx2mPmthzNl41pycpMF3ks5r5ITngaJpZM4NY-qh> .
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/csaftoiu/yahoo-groups-backup/issues/41#issuecomment-301151188, or mute the thread https://github.com/notifications/unsubscribe-auth/AAYd_kpLdbuvMpFpIbr8lckUXzgnIUrjks5r5KQmgaJpZM4NY-qh .
Ooh, that's unfortunate. That's the thing about hacks, they only work for so long . . . this will indeed have to be fixed in the code. I probably won't get to it any time soon unfortunately. Thanks for reporting the issue and identifying the cause though, that will make a fix possible :).
Don't really have time right now to dig in, but if I get a chance, I'll let you know.
It looks like you may be able to set the preference via:
http://splinter.readthedocs.io/en/latest/drivers/firefox.html#how-to-use-selenium-capabilities-for-firefox
using:
https://seleniumhq.github.io/selenium/docs/api/py/webdriver_firefox/selenium.webdriver.firefox.options.html#module-selenium.webdriver.firefox.options
Using this to set the 'devtools.jsonview.enabled' preference to false would likely keep your hack working fine.
Also, I saw this little snippet, and change it to the desired preference setting, if that helps at all:
import os
from selenium import webdriver
fp = webdriver.FirefoxProfile() fp.set_preference("devtools.jsonview.enabled",False)
browser = webdriver.Firefox(firefox_profile=fp)
Thanks for what looks like an amazing script! I've been struggling with Yahoo Groups for awhile now. As I said, if I have more time, I'll try to dig in.
Chris
On Fri, May 12, 2017 at 2:06 PM, Claudiu [email protected] wrote:
Ooh, that's unfortunate. That's the thing about hacks, they only work for so long . . . this will indeed have to be fixed in the code. I probably won't get to it any time soon unfortunately. Thanks for reporting the issue and identifying the cause though, that will make a fix possible :).
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/csaftoiu/yahoo-groups-backup/issues/41#issuecomment-301160665, or mute the thread https://github.com/notifications/unsubscribe-auth/AAYd_pH0bggHVP6jKIRs_2_kmpK4wtaeks5r5K2hgaJpZM4NY-qh .
I get the same error and spent a bit of time trying to fix but to no avail.
Inspired by this project a while back I started on a Ruby version which I have just finished given I can't get this one working. If anyone is interested see https://github.com/jonbartlett/yahoo-groups-export
@jonbartlett Are you looking at adding photo export, too?
@changeling Photos attached to posts? If so, possibly but there are so few in the forum I am migrating it probably isn't a priority.
If you want to get involved find a post with a photo and see how it is represented through the API:
https://groups.yahoo.com/api/v1/groups/
Also better if we move this conversation over to my repo.
i encountered the error described in the OP while passing --driver=chrome
to the script to get around the issue described in this thread https://github.com/csaftoiu/yahoo-groups-backup/issues/47#issuecomment-417490160
the script ran for a bit but then produced the "splinter.exceptions.ElementDoesNotExist: no elements could be found with tag_name "pre"
message
@JustinCEO If you're still seeing this on my fork, can you open an issue there with your stack traces?
@hrenfroe I did run into the issue. The problem was a Yahoo Splash screen which prevented the login process to initiate. I just increased the time.sleep()
durations to 5 seconds in the yahoo-groups-backup/scrapper.py
function _process_login_page
, which gives me the time to click "ok" on the spash screen, then the login goes on smoothly.
Also, for the record if anybody stumbles upon this (as we're all in a hurry to backup our groups). One of the groups I had to backup is big, more than 50k posts. Nodejs runs in a memory heap problem when stringifying the jsonp loaded in memory before splitting it into the data.messagedata-xxx-xxx.js
files.
The quick fix is to add an argument to dump_site.py
in the subcommands
dir to increase the memory for the nodejs's V8 thread : --max_old_space_size=4096
sufficed for me.
def render_search_indices(self):
subprocess.Popen([
"node", "--max_old_space_size=4096", P.join(P.dirname(P.realpath(__file__)), 'generate_search_index.js'),
P.join(self.data_dir)
]).communicate()