pdf_paragraphs_extraction
pdf_paragraphs_extraction copied to clipboard
pdf_features and a few other libraries are not imported
Even though pdf_features is in the installed libraries within venv, running 'pip list' does not return the library.
As a result, when running the following command, the script errors out:
(venv) asleroid@Aslis-MBP pdf_paragraphs_extraction % python src/create_paragraph_extractor_model.py /Users/asleroid/Code/pdf-labeled-data/labeled_data/paragraph_extraction loading one_column_test from /Users/asleroid/Code/pdf-labeled-data/labeled_data/paragraph_extraction/one_column_test Traceback (most recent call last): File "/Users/asleroid/Code/pdf_paragraphs_extraction/src/create_paragraph_extractor_model.py", line 25, in <module> train_model() File "/Users/asleroid/Code/pdf_paragraphs_extraction/src/create_paragraph_extractor_model.py", line 12, in train_model pdf_paragraph_tokens_list = load_labeled_data(PDF_LABELED_DATA_ROOT_PATH) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/asleroid/Code/pdf_paragraphs_extraction/src/paragraph_extraction_trainer/load_labeled_data.py", line 34, in load_labeled_data pdf_paragraph_tokens = PdfParagraphTokens.from_labeled_data(pdf_labeled_data_root_path, dataset, pdf_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/asleroid/Code/pdf_paragraphs_extraction/src/paragraph_extraction_trainer/PdfParagraphTokens.py", line 29, in from_labeled_data pdf_features = PdfFeatures.from_labeled_data(pdf_labeled_data_root_path, dataset, pdf_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/asleroid/Code/pdf_paragraphs_extraction/venv/lib/python3.11/site-packages/pdf_features/PdfFeatures.py", line 126, in from_labeled_data pdf_features.set_token_types(token_type_labels) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'set_token_types'