Add FiGraph dataset
Overview
This Pull Request (PR) introduces a real-world dataset, FiGraph dataset, to the PyTorch Geometric (PyG) library. FiGraph is a dynamic heterogeneous graph dataset that captures the evolving relationships within financial networks over a span of nine years. This dataset is particularly useful for node classification tasks where both the temporal dynamics and heterogeneous nature of the graph are crucial.
Dataset Details
Dynamic Heterogeneous Graph
FiGraph is structured as a dynamic heterogeneous graph, meaning it not only evolves over time but also contains multiple types of nodes and edges. Each year from 2014 to 2022 is represented as a distinct graph snapshot within the dataset.
-
Time Span: 2014 to 2022
-
Graph Snapshots: 9 snapshots, one for each year
-
Node Types: 5 distinct types of nodes, labeled as:
-
L: Listed companies -
U: Unlisted companies -
H: Holding companies -
A: Auditors -
R: Regulatory bodies
-
-
Edge Types: 4 types of edges, representing different types of relationships:
-
Related-party transaction -
Investment -
Audit -
Supply chain
-
Yearly Snapshots
Each year's data is stored as a separate snapshot, capturing the state of the financial network at that time. The nodes' features and labels, as well as the graph structure, are allowed to change from year to year, making this dataset particularly suitable for studying temporal dynamics in graph-based learning tasks.
-
Node Features: Only nodes of type
L(Listed companies) have features, which include financial attributes such as profit and liabilities. These features can vary annually. -
Node Labels: Similarly, only
Ltype nodes have labels, which indicate whether a company's financial report for that year is fraudulent (Label = 1) or normal (Label = 0).
Code Structure
-
Dataset Code: Implemented in
torch_geometric/datasets/figraph.py. -
Data Files: The corresponding yearly CSV files are located in
torch_geometric/datasets/figraph/data/.
Example Usage
Researchers can load the FiGraph dataset as follows:
from torch_geometric.datasets import FiGraphDataset
dataset = FiGraphDataset(root='path_to_dataset')
It's not possible for me to merge 50 files and 9000 LOC. Do we need the model implementations for this PR?
Sorry, there may be some errors here. We will modify the PR following other builtin datasets.
2024-09-03 16:26:59 "Matthias Fey" @.***> 写道:
It's not possible for me to merge 50 files and 9000 LOC. Do we need the model implementations for this PR?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were assigned.Message ID: @.***>
It's not possible for me to merge 50 files and 9000 LOC. Do we need the model implementations for this PR?
Hello @rusty1s,
Thank you very much for taking the time to review my Pull Request. Due to privacy concerns and the need to further improve the dataset, I would like to withdraw this Pull Request. I kindly request your assistance in deleting it along with all associated commits from the repository history.
I sincerely apologize for any inconvenience this may cause and greatly appreciate your understanding and support.
Best regards,
XiaoguangWang23