paperswithcode-data icon indicating copy to clipboard operation
paperswithcode-data copied to clipboard

Generated code interfaces from paperswithcode data

Open julien-c opened this issue 4 years ago • 0 comments

Sharing this in case it's helpful to anyone. 🤗

Using JTD (Json type defs) tools (https://jsontypedef.com/), I inferred the following data interfaces from the Paperswithcode data dumps listed in this repo.

More specifically, I used a combination of:

  • https://jsontypedef.com/docs/jtd-infer/ to infer a JTD schema from the JSON files (need to convert files containing a JSON array of items into a .jsonlines file of items, which you can do with a jq one-liner)
  • https://jsontypedef.com/docs/jtd-codegen/ to generate code interfaces from the JTD

Here are the types in Typescript notation, but it would be trivial to generate the interfaces in any other supported language (including Python, @pierrci @elishowk)

For datasets.json

// Code generated by jtd-codegen for TypeScript v0.2.0

export interface DatasetDataLoader {
	url:        string;
	repo?:      string;
	frameworks: string[];
}

export interface DatasetPaper {
	title: string;
	/**
	 * Sometimes on PWC BUT NOT ALWAYS
	 */
	url:   string;
}

export interface DatasetTask {
	/**
	 * Pretty name
	 * 
	 * e.g. Image Classification
	 */
	task: string;
	/**
	 * on PWC
	 * 
	 * e.g. https://paperswithcode.com/task/image-classification
	 */
	url:  string;
}


export interface Dataset {
	/**
	 * e.g. https://paperswithcode.com/dataset/mnist
	 */
	url: string;
	/**
	 * e.g. MNIST
	 */
	name: string;
	full_name?: string;
	/**
	 * external
	 */
	homepage?: string;
	description: string;
	paper?: DatasetPaper;
	introduced_date?: string;
	/**
	 * Always null, it seems
	 */
	warning: any;
	data_loaders: DatasetDataLoader[];
	modalities: string[];
	/**
	 * list of tasks linked from this dataset
	 */
	tasks: DatasetTask[];
	languages: string[];
	num_papers: number;
	variants: string[];
}

For evaluation-tables.json

// Code generated by jtd-codegen for TypeScript v0.2.0

export interface DatasetLink {
	title: string;
	url: string;
}


export interface EvaluationTableDatasetSotaRowCodeLink {
	/**
	 * e.g. tensorflow/models
	 */
	title: string;
	/**
	 * e.g. https://github.com/tensorflow/models
	 */
	url: string;
}


export interface EvaluationTableDatasetSotaRow {
	code_links: EvaluationTableDatasetSotaRowCodeLink[];
	metrics: { [key: string]: string };
	model_links: any[];
	model_name: string;
	paper_date?: string;
	paper_title: string;
	paper_url: string;
	uses_additional_data: boolean;
}

export interface EvaluationTableDatasetSota {
	metrics: string[];
	rows: EvaluationTableDatasetSotaRow[];
}

export interface EvaluationTableDataset {
	/**
	 * Pretty name
	 * 
	 * e.g. FSNS - Test
	 */
	dataset: string;
	dataset_citations: any[];
	dataset_links: DatasetLink[];
	description: string;
	sota: EvaluationTableDatasetSota;
	subdatasets: any[];
}


export interface EvaluationTable {
	categories: string[];
	datasets: EvaluationTableDataset[];
	description: string;
	source_link: any;
	subtasks: EvaluationTable[];
	synonyms: any[];
	/**
	 * Pretty name
	 * 
	 * e.g. Optical Character Recognition
	 */
	task: string;
}

Did not do the other files, let me know if you want me to also generate interfaces for them.

julien-c avatar Aug 07 '21 17:08 julien-c