langchain
langchain copied to clipboard
Adds OpenAI functions powered document metadata tagger
Adds a new document transformer that automatically extracts metadata for a document based on an input schema. I also moved document_transformers.py to document_transformers/__init__.py to group it with this new transformer - it didn't seem to cause issues in the notebook, but let me know if I've done something wrong there.
Also had a linter issue I couldn't figure out:
MacBook-Pro:langchain jacoblee$ make lint
poetry run mypy .
docs/dist/conf.py: error: Duplicate module named "conf" (also at "./docs/api_reference/conf.py")
docs/dist/conf.py: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#mapping-file-paths-to-modules for more info
docs/dist/conf.py: note: Common resolutions include: a) using `--exclude` to avoid checking one of them, b) adding `__init__.py` somewhere, c) using `--explicit-package-bases` or adjusting MYPYPATH
Found 1 error in 1 file (errors prevented further checking)
make: *** [lint] Error 2
@rlancemartin @baskaryan
The latest updates on your projects. Learn more about Vercel for Git โ๏ธ
1 Ignored Deployment
| Name | Status | Preview | Comments | Updated (UTC) |
|---|---|---|---|---|
| langchain | โฌ๏ธ Ignored (Inspect) | Jul 13, 2023 4:50am |
PR Analysis
- ๐ฏ Main theme: Adding a new document transformer that automatically extracts metadata for a document based on an input schema
- ๐ Description and title: Yes
- ๐ Type of PR: Enhancement
- ๐งช Relevant tests added: No
- โจ Minimal and focused: Yes, the PR is focused on adding a new feature of document metadata tagging and does not include unrelated changes.
- ๐ Security concerns: No, the PR does not introduce possible security concerns or issues. The changes are related to data processing and do not involve any security-sensitive operations.
PR Feedback
-
๐ก General PR suggestions: The PR is well-structured and the code changes are clear. However, it lacks tests for the new functionality. It's important to add tests to ensure the new feature works as expected and to prevent regressions in the future. Additionally, the linter issue mentioned in the PR description should be resolved.
-
๐ค Code suggestions:
-
relevant file: langchain/document_transformers/init.py suggestion content: Consider adding type hints to the function
_filter_similar_embeddingsand_filter_cluster_embeddingsfor better code readability and maintainability. [important] -
relevant file: langchain/document_transformers/openai_functions.py suggestion content: In the
MetadataTaggerclass, theatransform_documentsmethod raises aNotImplementedError. If this method is not intended to be used, consider removing it or adding a docstring to explain why it's not implemented. [medium] -
relevant file: langchain/document_transformers/openai_functions.py suggestion content: In the
create_metadata_taggerfunction, consider adding a docstring to explain the purpose of the function, its arguments, and its return value. This will improve code readability and maintainability. [medium] -
relevant file: langchain/chains/openai_functions/tagging.py suggestion content: In the
create_tagging_chainandcreate_tagging_chain_pydanticfunctions, consider adding a docstring to explain the purpose of the functions, their arguments, and their return values. This will improve code readability and maintainability. [medium]
-
How to use
Tag me in a comment '@CodiumAI-Agent' to ask for a new review after you update the PR. You can also tag me and ask any question, for example '@CodiumAI-Agent is the PR ready for merge?'
Nice. Yes. This is good. Consolidating document_transformers in a new dir will make additions easier going forward.
Like @baskaryan said, https://github.com/hwchase17/langchain/pull/7379 is going in soon.