python-seo-analyzer
python-seo-analyzer copied to clipboard
loop on multiple Analyze website write on same variable object
Hi, I want to analyze multiple website by loop on a list and write the results in a json file.
I notice that when we crawl 2 differents website and we store the output in two differents variables (let's say A and B), the second variable, B, gets incremented of A...and so on for other crawls.
It is like the analyse()
write on a the same object !!
And it gets even weirder when I delete A and B with a del A,B
, the analyse()
function do not re-run, it recovers the old results from nowhere !!
I tried to use function % reset
to erase the memory...but still recover the results from local memory !!!
here is an example:
from seoanalyzer import analyze
A = analyze("https://krugerwildlifesafaris.com/")
# the lenght is 90
print(len(A['pages']))
B = analyze("http://www.vintage.co.bw/")
# the lenght is 90
print(len(A['pages']))
# the lenght is 100 but it should be 10
print(len(B['pages']))
the A has 90 pages and B should have only 10 pages, but it has 90 from A + its own 10..
how to avoid this ? Why this erratic behavior ?
regards,
karim.m
Same problem guyz !
I fixed the issue by doing this: Go to the ("Manifest") class in the implementation and look for the "Analyze" method.
At the end of the method, before "return output" just write: Manifest.clear_cache()
Everything will be cool !
Hi Ghezaielm,
Thanks for your quick feedback..by the meantime, I used another workaround, see below:
import os
for website in list_of_website:
----file_name = # whatever name file you want
----command='seoanalyze {} -f json > "{}"'.format(website,file_name)
----returned_value = os.system(command)
----print(str(returned_value)+' name= '+file_name+' '+website
)
And it is convenient if you want parallelize you crawl by using ThreadPoolExecutor
I have 8 cores /20 threads CPU, it is damn fast...I crawled 20k websites in few hours !!
with concurrent.futures.ThreadPoolExecutor(max_workers=80) as executor:
#48 Start the load operations and mark each future with its URL
future_to_url = {executor.submit(analyze_SEO, url): url for row in list_website}
#print(future_to_url)
for future_url in concurrent.futures.as_completed(future_to_url):
url_completed = future_to_url[future_url]
try:
data = url_completed .result()
if data!=None:
print(data)
except Exception as exc:
print('%r generated an exception: %s' % url, exc)
(PS: sorry I did not how to make the spaces on github quote for code)
Did you submit the correction on github ?
Ah, right. I'm putting this on my roadmap for v4.1. 👍