paperswithcode-data
paperswithcode-data copied to clipboard
Generated code interfaces from paperswithcode data
Sharing this in case it's helpful to anyone. 🤗
Using JTD (Json type defs) tools (https://jsontypedef.com/), I inferred the following data interfaces from the Paperswithcode data dumps listed in this repo.
More specifically, I used a combination of:
- https://jsontypedef.com/docs/jtd-infer/ to infer a JTD schema from the JSON files (need to convert files containing a JSON array of items into a .jsonlines file of items, which you can do with a
jqone-liner) - https://jsontypedef.com/docs/jtd-codegen/ to generate code interfaces from the JTD
Here are the types in Typescript notation, but it would be trivial to generate the interfaces in any other supported language (including Python, @pierrci @elishowk)
For datasets.json
// Code generated by jtd-codegen for TypeScript v0.2.0
export interface DatasetDataLoader {
url: string;
repo?: string;
frameworks: string[];
}
export interface DatasetPaper {
title: string;
/**
* Sometimes on PWC BUT NOT ALWAYS
*/
url: string;
}
export interface DatasetTask {
/**
* Pretty name
*
* e.g. Image Classification
*/
task: string;
/**
* on PWC
*
* e.g. https://paperswithcode.com/task/image-classification
*/
url: string;
}
export interface Dataset {
/**
* e.g. https://paperswithcode.com/dataset/mnist
*/
url: string;
/**
* e.g. MNIST
*/
name: string;
full_name?: string;
/**
* external
*/
homepage?: string;
description: string;
paper?: DatasetPaper;
introduced_date?: string;
/**
* Always null, it seems
*/
warning: any;
data_loaders: DatasetDataLoader[];
modalities: string[];
/**
* list of tasks linked from this dataset
*/
tasks: DatasetTask[];
languages: string[];
num_papers: number;
variants: string[];
}
For evaluation-tables.json
// Code generated by jtd-codegen for TypeScript v0.2.0
export interface DatasetLink {
title: string;
url: string;
}
export interface EvaluationTableDatasetSotaRowCodeLink {
/**
* e.g. tensorflow/models
*/
title: string;
/**
* e.g. https://github.com/tensorflow/models
*/
url: string;
}
export interface EvaluationTableDatasetSotaRow {
code_links: EvaluationTableDatasetSotaRowCodeLink[];
metrics: { [key: string]: string };
model_links: any[];
model_name: string;
paper_date?: string;
paper_title: string;
paper_url: string;
uses_additional_data: boolean;
}
export interface EvaluationTableDatasetSota {
metrics: string[];
rows: EvaluationTableDatasetSotaRow[];
}
export interface EvaluationTableDataset {
/**
* Pretty name
*
* e.g. FSNS - Test
*/
dataset: string;
dataset_citations: any[];
dataset_links: DatasetLink[];
description: string;
sota: EvaluationTableDatasetSota;
subdatasets: any[];
}
export interface EvaluationTable {
categories: string[];
datasets: EvaluationTableDataset[];
description: string;
source_link: any;
subtasks: EvaluationTable[];
synonyms: any[];
/**
* Pretty name
*
* e.g. Optical Character Recognition
*/
task: string;
}
Did not do the other files, let me know if you want me to also generate interfaces for them.