pytorch_geometric icon indicating copy to clipboard operation
pytorch_geometric copied to clipboard

Add FiGraph dataset

Open XiaoguangWang23 opened this issue 1 year ago • 2 comments

Overview

This Pull Request (PR) introduces a real-world dataset, FiGraph dataset, to the PyTorch Geometric (PyG) library. FiGraph is a dynamic heterogeneous graph dataset that captures the evolving relationships within financial networks over a span of nine years. This dataset is particularly useful for node classification tasks where both the temporal dynamics and heterogeneous nature of the graph are crucial.

Dataset Details

Dynamic Heterogeneous Graph

FiGraph is structured as a dynamic heterogeneous graph, meaning it not only evolves over time but also contains multiple types of nodes and edges. Each year from 2014 to 2022 is represented as a distinct graph snapshot within the dataset.

  • Time Span: 2014 to 2022

  • Graph Snapshots: 9 snapshots, one for each year

  • Node Types: 5 distinct types of nodes, labeled as:

    • L: Listed companies
    • U: Unlisted companies
    • H: Holding companies
    • A: Auditors
    • R: Regulatory bodies
  • Edge Types: 4 types of edges, representing different types of relationships:

    • Related-party transaction
    • Investment
    • Audit
    • Supply chain

Yearly Snapshots

Each year's data is stored as a separate snapshot, capturing the state of the financial network at that time. The nodes' features and labels, as well as the graph structure, are allowed to change from year to year, making this dataset particularly suitable for studying temporal dynamics in graph-based learning tasks.

  • Node Features: Only nodes of type L (Listed companies) have features, which include financial attributes such as profit and liabilities. These features can vary annually.
  • Node Labels: Similarly, only L type nodes have labels, which indicate whether a company's financial report for that year is fraudulent (Label = 1) or normal (Label = 0).

Code Structure

  • Dataset Code: Implemented in torch_geometric/datasets/figraph.py.
  • Data Files: The corresponding yearly CSV files are located in torch_geometric/datasets/figraph/data/.

Example Usage

Researchers can load the FiGraph dataset as follows:

from torch_geometric.datasets import FiGraphDataset

dataset = FiGraphDataset(root='path_to_dataset')

XiaoguangWang23 avatar Aug 29 '24 09:08 XiaoguangWang23

It's not possible for me to merge 50 files and 9000 LOC. Do we need the model implementations for this PR?

rusty1s avatar Sep 03 '24 08:09 rusty1s

Sorry, there may be some errors here. We will modify the PR following other builtin datasets.

2024-09-03 16:26:59 "Matthias Fey" @.***> 写道:

It's not possible for me to merge 50 files and 9000 LOC. Do we need the model implementations for this PR?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were assigned.Message ID: @.***>

XiaoguangWang23 avatar Sep 04 '24 02:09 XiaoguangWang23

It's not possible for me to merge 50 files and 9000 LOC. Do we need the model implementations for this PR?

Hello @rusty1s,

Thank you very much for taking the time to review my Pull Request. Due to privacy concerns and the need to further improve the dataset, I would like to withdraw this Pull Request. I kindly request your assistance in deleting it along with all associated commits from the repository history.

I sincerely apologize for any inconvenience this may cause and greatly appreciate your understanding and support.

Best regards,

XiaoguangWang23

XiaoguangWang23 avatar Nov 13 '24 07:11 XiaoguangWang23