fix: modified DirectoryDataComponent to support user defined custom file types
Fixes https://github.com/langflow-ai/langflow/issues/4012
This pull request is automatically being deployed by Amplify Hosting (learn more).
Access this pull request here: https://pr-4017.dmtpw4p5recq1.amplifyapp.com
Not sure about the CI error but I think it's unrelated
=========================== short test summary info ============================
SKIPPED [1] src/backend/tests/unit/graph/graph/test_base.py:152: Temporarily disabled
FAILED src/backend/tests/unit/test_data_components.py::test_directory_without_mocks - ValueError: Error loading file /home/runner/work/langflow/langflow/docs/docs/Components/938852908.png: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte
FAILED src/backend/tests/unit/test_database.py::test_update_flow - assert 400 == 200
+ where 400 = <Response [400 Bad Request]>.status_code
= 2 failed, 102 passed, 1 skipped, 439 deselected, 11 warnings in 80.86s (0:01:20) =
* A new version of composio is available, run `pip install
composio-core==0.5.28` to update.
make: *** [Makefile:141: unit_tests] Error 1
Error: Final attempt failed. Child_process exited with error code 2
This CI Error looks unrelated to changes made in this PR as well
=========================== short test summary info ============================
FAILED src/backend/tests/unit/test_data_components.py::test_directory_component_build_with_multithreading - AssertionError: expected call not found.
Expected: retrieve_file_paths('/home/runner/work/langflow/langflow/src/backend/tests/unit', False, True, 1)
Actual: retrieve_file_paths('/home/runner/work/langflow/langflow/src/backend/tests/unit', False, True, 1, None)
Looking further into it, it's being caused by this
file_path = '/home/runner/work/langflow/langflow/docs/docs/Components/938852908.png'
def read_text_file(file_path: str) -> str:
with open(file_path, "rb") as f:
raw_data = f.read()
result = chardet.detect(raw_data)
encoding = result["encoding"]
if encoding in ["Windows-1252", "Windows-1254", "MacRoman"]:
encoding = "utf-8"
with open(file_path, encoding=encoding) as f:
> return f.read()
E UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte
Managed to reproduce the issue locally
uv run pytest src/backend/tests \
--ignore=src/backend/tests/integration \
--instafail -ra -m "not api_key_required" \
--durations-path src/backend/tests/.test_durations \
--splitting-algorithm least_duration \
--splits 5 --group 5
I think I got to the root of the issue. It is indeed being caused by Directory.py 🤦
src/backend/base/langflow/components/data/Directory.py:92: in load_directory
loaded_data = [parse_text_file_to_data(file_path, silent_errors) for file_path in file_paths]
Oops, attempting to rename the branch to "directory-fix" instead resulted in it deleting the main branch....................
It appears one of the uv failures occurs upstream as well
src/backend/tests/unit/test_database.py::test_update_flow FAILED [ 62%]
__________________________________________________________ test_update_flow ___________________________________________________________
client = <httpx.AsyncClient object at 0x70edc2954140>
json_flow = '{\n "description": "",\n "name": "BasicExample",\n "id": "a53f9130-f2fa-4a3e-b22a-3856d946351a",\n "data": {\n ... "viewport": {\n "x": 1,\n "y": 0,\n "zoom": 0.5\n }\n },\n "last_tested_version": "0.6.2"\n}\n'
active_user = User(id=UUID('e6761bca-2ba5-460b-8344-aa15645aac1c'), password='$2b$12$K5sBRC2m5aa0BHaWVIn5yunp.Qljj5o8Z31jUYDLEEcn0GN...ofile_image=None, is_superuser=False, updated_at=datetime.datetime(2024, 10, 5, 12, 51, 7, 502973), store_api_key=None)
logged_in_headers = {'Authorization': 'Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJlNjc2MWJjYS0yYmE1LTQ2MGItODM0NC1hYTE1NjQ1YWFjMWMiLCJ0eXBlIjoiYWNjZXNzIiwiZXhwIjoxNzI4MTM2MjcxfQ.N1wMUo7aQD4dy86BHY2ElJvMkgQjIPfDUQdnRt96JBo'}
async def test_update_flow(client: TestClient, json_flow: str, active_user, logged_in_headers):
flow = orjson.loads(json_flow)
data = flow["data"]
flow = FlowCreate(name="Test Flow", description="description", data=data)
response = await client.post("api/v1/flows/", json=flow.model_dump(), headers=logged_in_headers)
flow_id = response.json()["id"]
updated_flow = FlowUpdate(
name="Updated Flow",
description="updated description",
data=data,
)
response = await client.patch(f"api/v1/flows/{flow_id}", json=updated_flow.model_dump(), headers=logged_in_headers)
> assert response.status_code == 200
E assert 400 == 200
E + where 400 = <Response [400 Bad Request]>.status_code
src/backend/tests/unit/test_database.py:109: AssertionError
I'll ignore it for now and focus on fixing Directory.py uv's test failure instead
I think I got to the root of the issue. It is indeed being caused by Directory.py 🤦
src/backend/base/langflow/components/data/Directory.py:92: in load_directory loaded_data = [parse_text_file_to_data(file_path, silent_errors) for file_path in file_paths]
Okay, I fixed it. I simply had to check that self.types is valid(isn't False, an empty list, ...) as well as check it it isn't a list with an empty string rather than only checking the latter
The only FAILED instance by UV should now be the one that also occurs upstream
Edit: Looks like I missed something.... Will run this without the split flag (noticed that I was only testing group 5 locally before pushing)
uv run pytest src/backend/tests \
--ignore=src/backend/tests/integration \
--instafail -ra -m "not api_key_required" \
--durations-path src/backend/tests/.test_durations \
--splitting-algorithm least_duration
Seems like I need to update the test to expect the new TEXT_FILE_TYPES argument that gets passed in by default
Ok, seems like all the checks are passing. I'll squash the commits