openai-cookbook
openai-cookbook copied to clipboard
Enhancements and Refactoring of Python Code Extraction Methods
PR Title: Enhancements and Refactoring of Python Code Extraction Methods
PR Description: This pull request introduces enhancements and refactoring to the Code_search.ipynb script, which is used for extracting Python functions and generating their text embeddings. The proposed modifications not only make the script more efficient and user-friendly, but also ease the process of future maintenance.
Included updates:
-
Normalization of File Paths: Update code to use the relative_to() method from pathlib.Path. The previous string.replace() function produces inconsistent results, as it could potentially replace substrings not part of the root directory. To illustrate this, please consider the example below:
import pandas as pd from pathlib import Path data = {'file_path': [ 'repo/main/src/file1_copy/other/repo/main/src/file1', 'repo/main/src/file1_copy/file1', ]} df = pd.DataFrame(data) # Approach 1: Path.relative_to() root_dir = Path('repo/main/src') df['Path.relative_to()'] = df['file_path'].map(lambda x: Path(x).relative_to(root_dir)) # Approach 2: string.replace() root_dir = 'repo/main/src' df['str.replace()'] = df['file_path'].apply(lambda x: x.replace(root_dir, ''))file_path Path.relative_to() string.replace() 0 repo/main/src/file1_copy/other/repo/main/src/file1 file1_copy/other/repo/main/src/file1 /file1_copy/other//file11 repo/main/src/file1_copy/file1 file1_copy/file1 /file1_copy/file1 As seen above, Path.relative_to() provides accurate relative path computation, considering the file structure and ensuring correct results, even in cases where the base directory appears elsewhere in the file path.
-
Capture
async def: Code file searching now extracts bothdefandasync defmethods. -
Refactor
get_functions: now handles files using a context manager for safer and more reliable file operations. -
Refactor
get_until_no_space: Update logic to prevent potential index out of range errors. -
Improve Directory Search: Update code to use pathlib.Path.glob() to search for files vs. the original os.walk() and glob() methods. The os.walk() method traverses the directory tree recursively, generating a tuple for each directory it encounters. The pathlib.Path.glob() method performs the file search directly, without generating intermediate results. This can lead to improved performance, as the search is more efficient and consumes less memory.
-
Implement
extract_functions_from_repo: Encompasses the logic of code file function extraction and printing.
These changes collectively enhance the functionality and maintainability of the script, providing better support for future development and analysis tasks involving the openai-cookbook repository.
Best Regards, Eli
I will try to review this week. Thanks for the detailed and high-quality contribution!
By the way, really appreciate you taking the time to describe and document your improvements. Always love to see it. :)