graphrag icon indicating copy to clipboard operation
graphrag copied to clipboard

[Issue]: ValueError when specifying custom timestamp_column in CSV to YAML conversion

Open silenceliang opened this issue 1 year ago • 0 comments

Describe the issue

When attempting to convert CSV data into a YAML format, specifying a custom column for the timestamp results in a ValueError. The exception is raised within the pandas library, specifically at the following location:

.pyenv/versions/graphrag/lib/python3.10/site-packages/pandas/core/reshape/concat.py, on line 507, with the error message “No objects to concatenate”. 

This issue occurs during the data input process where the CSV data is expected to be formatted according to the settings in a YAML file.

Steps to reproduce

  1. Set setting.yaml input: type: file # or blob file_type: csv # or csv base_dir: "input" file_encoding: utf-8 file_pattern: ".*\.csv" timestamp_column: "event_time"
  2. run python -m graphrag.index --root ./myFolder
  3. exception raised

GraphRAG Config Used

input: type: file # or blob file_type: csv # or csv base_dir: "input" file_encoding: utf-8 file_pattern: ".*\.csv" text_column: "description" timestamp_column: "event_time"

Logs and screenshots

python -m graphrag.index --root ./ragtest_event_csv
🚀 Reading settings from ragtest_event_csv/settings.yaml
Traceback (most recent call last):
  File
"/Users/brian_liang/.pyenv/versions/3.10.4/lib/python3.10/runpy.py", line
196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File
"/Users/brian_liang/.pyenv/versions/3.10.4/lib/python3.10/runpy.py", line
86, in _run_code
    exec(code, run_globals)
  File "/Users/brian_liang/graphrag/graphrag/index/__main__.py", line 76,
in <module>
    index_cli(
  File "/Users/brian_liang/graphrag/graphrag/index/cli.py", line 161, in
index_cli
    _run_workflow_async()
  File "/Users/brian_liang/graphrag/graphrag/index/cli.py", line 159, in
_run_workflow_async
    asyncio.run(execute())
  File
"/Users/brian_liang/.pyenv/versions/3.10.4/lib/python3.10/asyncio/runners
.py", line 44, in run
    return loop.run_until_complete(main)
  File "uvloop/loop.pyx", line 1517, in
uvloop.loop.Loop.run_until_complete
  File "/Users/brian_liang/graphrag/graphrag/index/cli.py", line 123, in
execute
    async for output in run_pipeline_with_config(
  File "/Users/brian_liang/graphrag/graphrag/index/run.py", line 144, in
run_pipeline_with_config
    dataset = dataset if dataset is not None else await
_create_input(config.input)
  File "/Users/brian_liang/graphrag/graphrag/index/run.py", line 133, in
_create_input
    return await load_input(config, progress_reporter, root_dir)
  File "/Users/brian_liang/graphrag/graphrag/index/input/load_input.py",
line 81, in load_input
    results = await loader(config, progress, storage)
  File "/Users/brian_liang/graphrag/graphrag/index/input/csv.py", line
135, in load
    result = pd.concat(files_loaded)
  File
"/Users/brian_liang/.pyenv/versions/graphrag/lib/python3.10/site-packages
/pandas/core/reshape/concat.py", line 382, in concat
    op = _Concatenator(
  File
"/Users/brian_liang/.pyenv/versions/graphrag/lib/python3.10/site-packages
/pandas/core/reshape/concat.py", line 445, in __init__
    objs, keys = self._clean_keys_and_objs(objs, keys)
  File
"/Users/brian_liang/.pyenv/versions/graphrag/lib/python3.10/site-packages
/pandas/core/reshape/concat.py", line 507, in _clean_keys_and_objs
    raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate
⠋ GraphRAG Indexer
└──Loading Input (csv) - 1 files loaded (0 filtered) ━━━━ 100% 0:0… 0:0…

image

Additional Information

  • GraphRAG Version: 0.1.1
  • Operating System: MACOS 14.5
  • Python Version: 3.10.4
  • Related Issues:

silenceliang avatar Jul 16 '24 03:07 silenceliang