gqlalchemy icon indicating copy to clipboard operation
gqlalchemy copied to clipboard

[BUG] CSVLocalFileSystemImporter can't load datetime from csv

Open SuperBo opened this issue 2 years ago • 17 comments

Memgraph version Which version did you use? Memgraph v2.13.0

Environment Some information about the environment you are using Memgraph on: operating system, how do you connect, with or without docker, which driver etc.

Docker

Describe the bug

Cant load datetime string from csv to memgraph.

To Reproduce Steps to reproduce the behavior:

Use following scripts

importer = loaders.CSVLocalFileSystemImporter(
    data_configuration=yaml.safe_load(DATA_CONFIG),
    path = "tmp/airflow/data",
    memgraph=memgraph
)
importer.translate(drop_database_on_start=True)

Expected behavior

No error

Logs If applicable, add logs of Memgraph, CLI output or screenshots to help explain your problem.

Loading data from table tmp/airflow/data/a.csv...
Traceback (most recent call last):
  File "/Users/superbo/Workspace/airflow-pipelines/test_memgraph_load.py", line 36, in <module>
    load_data()
  File "/Users/superbo/Workspace/airflow-pipelines/test_memgraph_load.py", line 32, in load_data
    importer.translate(drop_database_on_start=True)
  File "/Users/superbo/Workspace/airflow-pipelines/.venv/lib/python3.11/site-packages/gqlalchemy/transformations/importing/loaders.py", line 472, in translate
    self._load_nodes()
  File "/Users/superbo/Workspace/airflow-pipelines/.venv/lib/python3.11/site-packages/gqlalchemy/transformations/importing/loaders.py", line 480, in _load_nodes
    self._save_row_as_node(label=collection_name, row=row)
  File "/Users/superbo/Workspace/airflow-pipelines/.venv/lib/python3.11/site-packages/gqlalchemy/transformations/importing/loaders.py", line 591, in _save_row_as_node
    QueryBuilder(connection=self._memgraph)
  File "/Users/superbo/Workspace/airflow-pipelines/.venv/lib/python3.11/site-packages/gqlalchemy/query_builders/declarative_base.py", line 748, in node
    properties_str = to_cypher_properties(kwargs)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/superbo/Workspace/airflow-pipelines/.venv/lib/python3.11/site-packages/gqlalchemy/utilities.py", line 157, in to_cypher_properties
    value_str = to_cypher_value(value, config)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/superbo/Workspace/airflow-pipelines/.venv/lib/python3.11/site-packages/gqlalchemy/utilities.py", line 109, in to_cypher_value
    return f"{datetimeKwMapping[value_type]}('{_format_timedelta(value) if isinstance(value, timedelta) else value.isoformat()}')"
              ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
KeyError: <class 'pandas._libs.tslibs.timestamps.Timestamp'>

Additional context Add any other context about the problem here.

SuperBo avatar Jan 21 '24 07:01 SuperBo

Hi @SuperBo, thank you for reporting this. Would you be able to share your DATA_CONFIG file in order for us to replicate your issue? You don't have to share real data if it's sensitive, I'm just interested in the format of the file

matea16 avatar Jan 22 '24 08:01 matea16

@matea16, here is example data, i'm using pyarrow==13.0.0

account_id,phone_no,created_dt,updated_dt
abc1,+1212423,2019-08-28 12:19:18,
bcd2,+12123232,2021-06-09 09:05:41,2021-06-09 09:05:41
b11e,+94745734,2021-12-13 16:42:45,2021-12-13 16:42:45

SuperBo avatar Jan 22 '24 09:01 SuperBo

Thank you for providing the data, we'll take a look at your issue shortly. I'll let you know if we have any additional questions to help us with the debugging :)

matea16 avatar Jan 22 '24 13:01 matea16

Hi @SuperBo, I see that you shared your CSV data, but did you create a data configuration object, similar to the the documentation? If yes, can you share that too?

katarinasupe avatar Jan 23 '24 08:01 katarinasupe

@SuperBo, I believe the issue with your CSV file could be due to the format of your date field. In our docs, you can see supported formats of temporal types, including the local time format. Let me know if adjusting that format helps

matea16 avatar Jan 23 '24 09:01 matea16

Hi @matea16, I also tested with iso datetime format "yyyy-MM-ddThh:mm:ss", and still met the same error.

@katarinasupe : of course, I created a data configuration like this

indices:
    accounts:
        - account_id
    transactions:
        - transaction_number

name_mappings:
    accounts:
        label: Account
    transactions:
        label: INTERACT

one_to_many_relations:
    accounts: []
    transactions: []

SuperBo avatar Jan 23 '24 09:01 SuperBo

I tested it out with your configuration file and it works as expected for me, even with the previous format of your datetime objects. I will provide you below with the code snippet that worked, the only difference is in parsing the yaml file, let me know if that solves the issue:

with open("./data_config.yaml", "r") as stream:
   try:
       parsed_yaml = yaml.load(stream, Loader=SafeLoader)
   except yaml.YAMLError as exc:
       print(exc)

 importer = CSVLocalFileSystemImporter(
     data_configuration=parsed_yaml,
     path="./",
     memgraph=memgraph
 )

 importer.translate(drop_database_on_start=True)

matea16 avatar Jan 23 '24 13:01 matea16

hello @matea16 , new update, the problem comes from the remaining file. In that file, I put timestamp format as follwing:

2024-01-04T03:48:11.336Z

SuperBo avatar Jan 23 '24 14:01 SuperBo

that format is unfortunately not supported since timezones are not supported, does it work without the timezone at the end?

matea16 avatar Jan 23 '24 14:01 matea16

I removed the last "Z" and the same error still occurs. However, when I remove the microseconds part, it works.

SuperBo avatar Jan 23 '24 14:01 SuperBo

I'm glad you were able to find a workaround. The issue is indeed the format of the data field. Memgraph doesn't support timezones and milliseconds and that was causing a problem. In our docs you can see all of the supported temporal types. Thank you again for reporting this, we have two issues that I linked above related to the time zones and this helps us to prioritize them.

matea16 avatar Jan 24 '24 09:01 matea16

I will close this issue since it was related to how Memgraph handles time format.

katarinasupe avatar Feb 27 '24 15:02 katarinasupe

Hi @SuperBo, I'm happy to let you know that zoned datetime data type has been added to Memgraph to the latest release with the following PR: https://github.com/memgraph/memgraph/pull/1866

matea16 avatar May 27 '24 08:05 matea16

Thank you Memgraph team, you rocks 👍

SuperBo avatar May 28 '24 07:05 SuperBo

Zoned datetime has been added to Memgraph, but in order for it to work properly in GQLAlchemy, I think datetime keyword should still be added to the GQLAlchemy in order for OGM to work properly.

katarinasupe avatar May 28 '24 11:05 katarinasupe

@matea16 I set a new milestone for GQLAlchemy. Will you be able to implement this until July 3rd?

katarinasupe avatar Jun 05 '24 08:06 katarinasupe

Yes, I'll make sure to have it ready by July 3rd

matea16 avatar Jun 05 '24 09:06 matea16