python-Wappalyzer
python-Wappalyzer copied to clipboard
created test for valid selector that does not increase time
My room wappybird implement ls your library. I started pulling the updated wappalyzer libraries. They have had issues with valid json, so I started pulling the current release of, but the tally selector is malformed. I talked to the maintainer of soupsieve and they provided a function to tech for valid selectors and skip if not. This replaces the crude try/catch code
I can update your repo to pull the current technologies if you would like. Or feel free to pull from wappybird.
Also, the pip is out of date and incompatible with the updated technologies files
Thanks @brandonscholet. Can you provide a test for an invalid selector please ?
The current release of npm-Wappalyzer has this broken selector Broken Selector iframe[scr*='//airtable.com/'], a[href*='//airtable.com/][target='_blank']
This will pull the latest into the technology file. They have had broken selectors for the past two releases
def update_technologies_from_latest():
print("updating technologies")
technologies_file = os.path.expanduser('~/.python-Wappalyzer/technologies.json')
technologies = {}
#get release page
latest_release = requests.get('https://api.github.com/repos/wappalyzer/wappalyzer/releases/latest').json()
#get zip from url
zip_url = requests.get(latest_release['zipball_url'])
myzip = ZipFile(io.BytesIO(zip_url.content))
#parse files
for listed_file in myzip.namelist():
#get all technology files
if "src/technologies" in listed_file and ".json" in listed_file:
#extract file into json
tech_json_file=myzip.read(listed_file).decode('UTF-8')
tech_json = json.loads(tech_json_file)
#add to full json
technologies = {**technologies, **tech_json}
if "src/categories.json" in listed_file:
#extract categories into json
categories = json.loads(myzip.read(listed_file).decode('UTF-8'))
#merge into one object
combined_object = {'categories': categories, 'technologies': technologies}
#write to file
with open(technologies_file, 'w', encoding='utf-8') as tfile:
tfile.write(json.dumps(combined_object))
tfile.flush()
print("done!\n")
webpage = WebPage.new_from_url("https://example.com", verify=False, timeout=60)
wappalyzer= Wappalyzer.latest(technologies_file=technologies_file)
techs = wappalyzer.analyze_with_versions_and_categories(webpage)
looking back, the print statement should probably be removed.