chartalist
chartalist copied to clipboard
Sponsored by the Canadian NSERC Discovery Grant RGPIN-2020-05665: Data Science on Blockchain and the National Science Foundation of USA under award number ECCS 2039701 Blockchain Graphs as Testbeds of...
Chartalist
Please visit https://www.chartalist.org for more information.
Overview
Chartalist is the first blockchain machine learning ready dataset platform from unspent transaction output and account-based blockchains.
The Chartalist package contains:
- Dataloaders which automate and handle the download of datasets from a single package import and a simple two-argument function call.
- Ability to use the downloaded dataset directly after download as a Pandas DataFrame from the same two-argument function call.
- Graph makers for convenient generation of a NetworkX digraph from the network datasets.
Installation
- Download this repository and extract the contents to a desired location.
- Inside the
chartalist_loader-mainfolder will serve as the working directory.
Requirements
Chartalist depends on the following:
- networkx>=2.8.3
- numpy>=1.22.3
- outdated>=0.2.1
- pandas>=1.4.2
- patool>=1.12
- requests>=2.27.1
- setuptools>=60.2.0
- torch>=1.11.0
- torch_scatter>=2.0.9
Datasets
The following is a summary of the available datasets and their related tasks. Use the corresponding version argument when using Chartalist to retrieve the correct dataset of interest. Click on the dataset for more information.
Bitcoin ML-Ready Datasets
| Dataset | Features | Version Constant |
|---|---|---|
| Ransomware Family: Bitcoinheist | address, year, day, length, weight, count, looped, neighbors, income, label | TYPE_PREDICTION |
| Bitcoin Transaction Network Input | trans | TRANSACTION_NETWORK_INPUT_SAMPLE |
| Bitcoin Transaction Network Output | trans | TRANSACTION_NETWORK_OUTPUT_SAMPLE |
| Bitcoin Block Times | unix_time | BLOCK_TIME |
| Bitcoin Price Data | date, price, year, day, totaltx | PRICE_PREDICTION |
Ethereum ML-Ready Datasets
| Dataset | Features | Version Constant |
|---|---|---|
| Ethereum Token Networks | token_address, from_address, to_address, value, transaction_hash, log_index, block_number | TYPE_PREDICTION_TRANSACTIONS |
| Ethereum Token Network Labels | type, address, name | TYPE_PREDICTION_LABELS |
| EtherDelta Ether-to-Token Transactions | transaction_hash, block_number, timestamp, tokenGet, amountGet, tokenGive, amountGive, get, give | ANOMALY_DETECTION_ETHER_DELTA_TRADES |
| IDEX Ether-to-Token Transactions | transaction_hash, status, block_number, gas, gas_price, timestamp, amountBuy, amountSell, expires, nonce, amount, tradeNonce, feeMake, feeTake, tokenBuy, tokenSell, maker, taker | ANOMALY_DETECTION_IDEX |
| Ether-to-Token Ether-Dollar Price | Date(UTC), UnixTimeStamp, Value | ANOMALY_DETECTION_ETHER_DOLLAR_PRICE |
| Bytom Network | fromAddress, toAddress, time, amount | MULTILAYER_BYTOM |
| Cybermiles Network | fromAddress, toAddress, time, amount | MULTILAYER_CYBERMILES |
| Decentraland Network | fromAddress, toAddress, time, amount | MULTILAYER_DECENTRALAND |
| Tierion Network | fromAddress, toAddress, time, amount | MULTILAYER_TIERION |
| Vechain Network | fromAddress, toAddress, time, amount | MULTILAYER_VECHAIN |
| ZRX Network | fromAddress, toAddress, time, amount | MULTILAYER_ZRX |
| Ethereum VeChain Token Transactions | fromAddress, toAddress, time, amount | PRICE_PREDICTION_VECHAIN |
| Ethereum ZRX Token Transactions | fromAddress, toAddress, time, amount | PRICE_PREDICTION_ZRX |
| Stablecoin ERC20 Transactions | fromAddress, toAddress, time, amount | STABLECOIN_ERC20 |
Dashcoin ML-Ready Datasets
| Dataset | Features | Version Constant |
|---|---|---|
| Dashcoin Transaction Network Input | trans | TRANSACTION_NETWORK_INPUT_SAMPLE |
| Dashcoin Transaction Network Output | trans | TRANSACTION_NETWORK_OUTPUT_SAMPLE |
Using Chartalist
- Navigate to the folder
chartalist_loader-mainand create a new.pyscript or add one which will serve as the working environment. - Ensure to add
import chartalistat the top of the script.
import chartalist
- All datasets in Chartalist can be downloaded and referenced as a Pandas DataFrame in a single function call.
For example:
data = chartalist.get_dataset(dataset='dashcoin', version='chartalist.DashcoinLoader.TRANSACTION_NETWORK_OUTPUT_SAMPLE', download=True, data_frame=True)
There are currently three options for the dataset argument:
- ethereum
- bitcoin
- dashcoin
Depending on the choice of the dataset argument, the version argument will take the following format:
For ethereum:
version=chartalist.EthereumLoader.
For bitcoin:
version=chartalist.BitcoinLoaders.
For dashcoin:
version=chartalist.DashcoinLoader.
Refer to #Datasets for the appropriate constant to append to the end of the version above and then the function is now ready to be used.
- Upon execution of the function, the corresponding dataset will be downloaded under the
datafolder in the working directory, if not already downloaded, when the script is executed and the Pandas DataFrame containing the dataset can be used directly for processing.
NOTE: Due the large nature of certain datasets, only sample data will be downloaded by the dataloader. If the complete dataset is required, click on the link corresponding to the dataset of interest and manually download the data from our website. Replace the contents of the sample dataset with the contents of the complete dataset under the
datafolder and proceed as normal.
Generating Networks
The Bitcoin and Dashcoin Transaction Network Input and Output datasets require the use of a Chartalist graph maker to be converted into a usable NetworkX digraph. See bitcoin_network_example.py or dashcoin_network_example.py for instructions.
For other Network datasets that have labels fromAddress, toAddress, and value labels such as the Ethereum Token Network dataset, the generation of a Networkx digraph can be done directly. See ethereum_network_example.py for instructions.
Parsing Datasets
Parsing any dataset for basic statistical information can be done so easily by using the Pandas Dataframe returned by the dataloader. See stablecoin_erc20_example.py for reference.
Address Exclusion
Please use our online tool to submit your request for removing an address from our dataset due to security and privacy issue.
BibTeX Citation
If you use Chartalist in a scientific publication, please cite us with the following bibtex:
@inproceedings{chartalistNeurips2022,
author = {Kiarash Shamsi and Yulia R. Gel and Murat Kantarcioglu and Cuneyt G. Akcora},
title = {Chartalist: Labeled Graph Datasets for UTXO and Account-based Blockchains},
booktitle = {Advances in Neural Information Processing Systems 36: Annual Conference
on Neural Information Processing Systems 2022, NeurIPS 2022, November 29-December
1, 2022, New Orleans, LA, USA},
pages = {1--14},
year = {2022},
url = {https://openreview.net/pdf?id=10iA3OowAV3}
}