langflow icon indicating copy to clipboard operation
langflow copied to clipboard

fix: modified DirectoryDataComponent to support user defined custom file types

Open EDLLT opened this issue 1 year ago • 1 comments

Fixes https://github.com/langflow-ai/langflow/issues/4012

EDLLT avatar Oct 03 '24 20:10 EDLLT

This pull request is automatically being deployed by Amplify Hosting (learn more).

Access this pull request here: https://pr-4017.dmtpw4p5recq1.amplifyapp.com

Not sure about the CI error but I think it's unrelated

=========================== short test summary info ============================
SKIPPED [1] src/backend/tests/unit/graph/graph/test_base.py:152: Temporarily disabled
FAILED src/backend/tests/unit/test_data_components.py::test_directory_without_mocks - ValueError: Error loading file /home/runner/work/langflow/langflow/docs/docs/Components/938852908.png: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte
FAILED src/backend/tests/unit/test_database.py::test_update_flow - assert 400 == 200
 +  where 400 = <Response [400 Bad Request]>.status_code
= 2 failed, 102 passed, 1 skipped, 439 deselected, 11 warnings in 80.86s (0:01:20) =

* A new version of composio is available, run `pip install 
composio-core==0.5.28` to update.
make: *** [Makefile:141: unit_tests] Error 1
Error: Final attempt failed. Child_process exited with error code 2

EDLLT avatar Oct 04 '24 14:10 EDLLT

This CI Error looks unrelated to changes made in this PR as well

=========================== short test summary info ============================
FAILED src/backend/tests/unit/test_data_components.py::test_directory_component_build_with_multithreading - AssertionError: expected call not found.
Expected: retrieve_file_paths('/home/runner/work/langflow/langflow/src/backend/tests/unit', False, True, 1)
  Actual: retrieve_file_paths('/home/runner/work/langflow/langflow/src/backend/tests/unit', False, True, 1, None)

Looking further into it, it's being caused by this

file_path = '/home/runner/work/langflow/langflow/docs/docs/Components/938852908.png'

    def read_text_file(file_path: str) -> str:
        with open(file_path, "rb") as f:
            raw_data = f.read()
            result = chardet.detect(raw_data)
            encoding = result["encoding"]
    
            if encoding in ["Windows-1252", "Windows-1254", "MacRoman"]:
                encoding = "utf-8"
    
        with open(file_path, encoding=encoding) as f:
>           return f.read()
E           UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte

Managed to reproduce the issue locally

uv run pytest src/backend/tests \
	--ignore=src/backend/tests/integration \
	--instafail -ra -m "not api_key_required" \
	--durations-path src/backend/tests/.test_durations \
	--splitting-algorithm least_duration \
	--splits 5 --group 5

I think I got to the root of the issue. It is indeed being caused by Directory.py 🤦

src/backend/base/langflow/components/data/Directory.py:92: in load_directory
    loaded_data = [parse_text_file_to_data(file_path, silent_errors) for file_path in file_paths]

EDLLT avatar Oct 04 '24 20:10 EDLLT

Oops, attempting to rename the branch to "directory-fix" instead resulted in it deleting the main branch....................

EDLLT avatar Oct 05 '24 12:10 EDLLT

It appears one of the uv failures occurs upstream as well

src/backend/tests/unit/test_database.py::test_update_flow FAILED                                                                [ 62%]
__________________________________________________________ test_update_flow ___________________________________________________________

client = <httpx.AsyncClient object at 0x70edc2954140>
json_flow = '{\n  "description": "",\n  "name": "BasicExample",\n  "id": "a53f9130-f2fa-4a3e-b22a-3856d946351a",\n  "data": {\n   ...    "viewport": {\n      "x": 1,\n      "y": 0,\n      "zoom": 0.5\n    }\n  },\n  "last_tested_version": "0.6.2"\n}\n'
active_user = User(id=UUID('e6761bca-2ba5-460b-8344-aa15645aac1c'), password='$2b$12$K5sBRC2m5aa0BHaWVIn5yunp.Qljj5o8Z31jUYDLEEcn0GN...ofile_image=None, is_superuser=False, updated_at=datetime.datetime(2024, 10, 5, 12, 51, 7, 502973), store_api_key=None)
logged_in_headers = {'Authorization': 'Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJlNjc2MWJjYS0yYmE1LTQ2MGItODM0NC1hYTE1NjQ1YWFjMWMiLCJ0eXBlIjoiYWNjZXNzIiwiZXhwIjoxNzI4MTM2MjcxfQ.N1wMUo7aQD4dy86BHY2ElJvMkgQjIPfDUQdnRt96JBo'}

    async def test_update_flow(client: TestClient, json_flow: str, active_user, logged_in_headers):
        flow = orjson.loads(json_flow)
        data = flow["data"]
    
        flow = FlowCreate(name="Test Flow", description="description", data=data)
        response = await client.post("api/v1/flows/", json=flow.model_dump(), headers=logged_in_headers)
    
        flow_id = response.json()["id"]
        updated_flow = FlowUpdate(
            name="Updated Flow",
            description="updated description",
            data=data,
        )
        response = await client.patch(f"api/v1/flows/{flow_id}", json=updated_flow.model_dump(), headers=logged_in_headers)
    
>       assert response.status_code == 200
E       assert 400 == 200
E        +  where 400 = <Response [400 Bad Request]>.status_code

src/backend/tests/unit/test_database.py:109: AssertionError

I'll ignore it for now and focus on fixing Directory.py uv's test failure instead

EDLLT avatar Oct 05 '24 13:10 EDLLT

I think I got to the root of the issue. It is indeed being caused by Directory.py 🤦

src/backend/base/langflow/components/data/Directory.py:92: in load_directory
    loaded_data = [parse_text_file_to_data(file_path, silent_errors) for file_path in file_paths]

Okay, I fixed it. I simply had to check that self.types is valid(isn't False, an empty list, ...) as well as check it it isn't a list with an empty string rather than only checking the latter

The only FAILED instance by UV should now be the one that also occurs upstream

Edit: Looks like I missed something.... Will run this without the split flag (noticed that I was only testing group 5 locally before pushing)

uv run pytest src/backend/tests \
	--ignore=src/backend/tests/integration \
	--instafail -ra -m "not api_key_required" \
	--durations-path src/backend/tests/.test_durations \
	--splitting-algorithm least_duration

Seems like I need to update the test to expect the new TEXT_FILE_TYPES argument that gets passed in by default

EDLLT avatar Oct 05 '24 13:10 EDLLT

Ok, seems like all the checks are passing. I'll squash the commits

EDLLT avatar Oct 05 '24 14:10 EDLLT