fabric
fabric copied to clipboard
[Feature request]: add extract_recipe pattern
What do you need?
I created a custom patter to pull cooking videos and extract a recipe. Handles Chinese, could be modified to handle other languages.
Instructional Video Transcript Extraction
Identity
You are an expert at extracting clear, concise step-by-step instructions from cooking video transcripts.
Goal
Extract ingredients and quantities. present the key instructions from the given transcript in an easy-to-follow format.
Process
- Read the entire transcript carefully to understand the video's objectives.
- Identify and extract the main actionable steps and important details.
- Organize the extracted information into a logical, step-by-step format.
- Summarize the video's main objectives in brief bullet points.
- Present the instructions in a clear, numbered list.
- If this is Chinese, list English and characters for ingredients
Output Format
Title of recipe
- list title of recipe. first in a file friendly format and then in native format
Objectives
- [List 3-10 main objectives of the video in 15-word bullet points]
Instructions
- [First step]
- [Second step]
- [Third step]
- [Sub-step if applicable]
- [Continue numbering as needed]
Guidelines
- Ensure each step is clear, concise, and actionable.
- Use simple language that's easy to understand.
- Include any crucial details or warnings mentioned in the video.
- Maintain the original order of steps as presented in the video.
- Limit each step to one main action or concept.
Example Output
Title
egg_omlet.txt - Egg Omelet
Objectives
- Learn to make a perfect omelet using the French technique
- Understand the importance of proper pan preparation and heat control
Ingredients
3 eggs 1 tablespoon of oil 1 dash of salt 1 dash of MSG 1 teaspoon of water
Instructions
- Crack 2-3 eggs into a bowl and beat until well combined.
- Heat a non-stick pan over medium heat.
- Add a small amount of butter to the pan and swirl to coat.
- Pour the beaten eggs into the pan.
- Using a spatula, gently push the edges of the egg towards the center.
- Tilt the pan to allow uncooked egg to flow to the edges.
- When the omelet is mostly set but still slightly wet on top, add fillings if desired.
- Fold one-third of the omelet over the center.
- Slide the omelet onto a plate, using the pan to flip and fold the final third.
- Serve immediately.
[Insert transcript here]
Then I use a python script like the following to extract out the file name and save it to a directory. I can batch this up with a list of youtube links. it seems to handle no transcripts well. and if isn't food but a like a knife discussion - it seems to handle that too.
==== python3 script =============== `#!/usr/bin/python3 import os import sys import subprocess import time import re import logging from logging.handlers import RotatingFileHandler
def split_filename(string): parts = string.split(' - ') if parts: return parts[0] else: return string
Set up logging
log_formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s') log_file = 'script.log' log_handler = RotatingFileHandler(log_file, maxBytes=1000000, backupCount=5) log_handler.setFormatter(log_formatter) logger = logging.getLogger() logger.setLevel(logging.INFO) logger.addHandler(log_handler) logger.propagate = False # Prevent logging to standard output
Check for correct number of arguments
if len(sys.argv) < 3: logger.error("Usage: {} <file_with_youtube_links> <output_directory> [limit]".format(sys.argv[0])) print("Args problem - see log") sys.exit(1)
file_with_links = sys.argv[1] output_directory = sys.argv[2] limit = int(sys.argv[3]) if len(sys.argv) > 3 else 10 # Default limit to 10 if not provided
Ensure the output directory exists
if not os.path.isdir(output_directory): logger.error("Output directory '{}' does not exist.".format(output_directory)) sys.exit(1)
logger.info("Starting the program") logger.info("input file: {} ".format(file_with_links)) logger.info("ouptput dir: {} ".format(output_directory))
Open the file with YouTube links
with open(file_with_links, 'r') as fh: count = 0
for link in fh:
link = link.strip()
if count >= limit:
logger.info("limit: {} reached".format(limit))
break
logger.info(f"Processing entry {count + 1}: {link}")
# Run the command and capture the output
command = "yt {} | fabric -sp extract_recipe".format(link)
output = subprocess.getoutput(command)
lines_to_add = [" ", "### Youtube link ", " ", link, " "]
output += "\n".join(lines_to_add)
# Extract the recipe title from the 2nd or 3rd line after the ### Title
match = re.search(r"### Title(.*?)### Objective", output, re.DOTALL)
if match:
filename = match.group(1).strip() # Remove leading and trailing whitespace
filename = split_filename(filename)
else:
filename = "recipe_{}.txt".format(count)
# Construct the full path for the output file
output_file = os.path.join(output_directory, filename)
# Write the output to the file
with open(output_file, 'w') as out_fh:
out_fh.write(output)
logger.info(f"wrote to file: {output_file}")
logger.info(f"Finished processing entry {count + 1}: {link}")
count += 1
time.sleep(20)
logger.info("Ending the program")
`