ippsec.github.io
ippsec.github.io copied to clipboard
Bug to fix in yt_crawl.py (Solution on the first comment)
In using yt_crawl.py, I found some problems in using it and took the liberty of fixing them. I hope it doesn't bother you.
The main problem was using argparse incorrectly and fixing the run function. Feel free to make further changes as some functions were commented out such as the call a subprocess
The code changed below:
My Solution
def run(api_key, gitCommit, datasetOutputLocation):
videos = []
print("Parsing Academy Courses")
output = parseAcademy()
for x in output:
videos.append(x)
print("Done Parsing Academy Courses")
tags = {}
for i in playlists:
for v in GetVideosInPlaylist(api_key,i[1]):
print(i[0])
tags[v] = i[0]
print("Grabbing video list")
output = GetVideosInChannel(api_key)
print("Sorting data")
for video in output:
tag = ""
description = video[3].split('\n')
title = video[2]
print(title)
if title in tags.keys():
tag = tags[title]
for line in description:
if line != "":
if not re.search('^\w[\d]*:[\d]', line):
line = '00:01 - ' + line
temp = line.split("-")
timestamp = temp[0].strip().split(":")
seconds = timestamp[-1]
hours = 0
try:
hours = int(timestamp[-3])
except:
pass
minutes = int(timestamp[-2]) + int(hours * 60)
newline = "-".join(temp[1::])
entry = SearchEntry(
title, video[1], minutes, seconds, tag, newline).AsJsonSerializable()
videos.append(entry)
#print(f'{title} | {video[1]} ^ {line}')
print("Serializing dataset")
dataset = json.dumps(videos)
print("Writing Dataset dataset...")
with open(datasetOutputLocation, "w") as ds:
ds.write(dataset)
if gitCommit:
gitDescription = "Updated dataset"
print(f"Commiting to git, with commit description {gitDescription}")
from subprocess import call
call(["git", "commit", "-m", gitDescription, datasetOutputLocation])
else:
print("Done! Now commit to git")
def parser():
parser = argparse.ArgumentParser(
description="Generate the dataset for the web app")
parser.add_argument(
'-a','--api_key',
help="Your API key from the Youtube API",
default=False)
parser.add_argument(
'--output_file', '-o',
help="The output path",
default="dataset.json")
parser.add_argument(
'-g', '--git_commit',
help="Automatically commit the dataset file to git (uses git cli)",
action='store_true')
args = parser.parse_args()
if not args.api_key:
args.api_key = open('yt.secret').read()
run(args.api_key, args.git_commit, args.output_file)
I recommend changing the readme file accordingly as it shows an example of wrong use. The correct would be:
python yt_crawl.py -a "API_KEY_HERE" -g