data_tooling icon indicating copy to clipboard operation
data_tooling copied to clipboard

Tools for managing datasets for governance and training.

Results 100 data_tooling issues
Sort by recently updated
recently updated
newest added

Subsets of The Pile: - pubmed - ubuntu_irc - europarl - hacker_news - nih_exporter

data catalog

Subset of The Pile. FreeLaw: Good as-is, I have acquired permission to use this from the org that owns the data (reported by @StellaAthena)

data catalog

- uid: theses_on_line - type: primary - description: - name: Theses on Line - description: Created in 2001, TEL (Theses-on-Line) is dedicated to the self-archiving of theses and HDRs (accreditations...

data catalog

- uid: libre_commons - type: primary - description: - name: LibreCommons - description: LibreCommons hosts curated Open Educational Resources from all 14 LibreTexts libraries in one convenient location. LibreCommons, the...

data catalog

Source: [Masader Project](https://arbml.github.io/masader/) - uid: talaa - entry: https://arbml.github.io/masader/card.html?54 - Link: https://github.com/saidziani/Arabic-News-Article-Classification - License : unknown - Year: 2015 - Language: ar - Dialect: ar-MSA: (Arabic (Modern Standard Arabic)) -...

data catalog
need custodian permission

Source: [Masader Project](https://arbml.github.io/masader/) - uid: arabic_online_commentary - entry: https://arbml.github.io/masader/card.html?39 - Link: https://github.com/sjeblee/AOC - License : unknown - Year: 2011 - Language: ar - Dialect: other - Domain: news articles -...

data catalog
need data sourcing feedback

Source: [Masader Project](https://arbml.github.io/masader/) - uid: osian - entry: https://arbml.github.io/masader/card.html?25 - Link: http://oujda-nlp-team.net/en/corpora/osian-corpus/ - License : CC BY-NC 4.0 - Year: 2019 - Language: ar - Dialect: other - Domain: news...

data catalog
need data sourcing feedback

- uid: wikihow_vietnamese_human_instructions - type: processed - description: - name: wikiHow Vietnamese Human Instructions - description: Step-by-step instructions in Vietnamese extracted from wikiHow and decomposed into a formal graph representation...

data catalog

- uid: vicon_visim400 - type: processed - description: - name: Vietnamese Datasets for Evaluating Semantic Models of (Dis-)Similarity and Relatedness (ViCon and ViSim-400) - description: This dataset consists of two...

data catalog
need data sourcing feedback

- uid: ahotsak - type: primary - description: - name: ahotsak - description: Catalogue of Basque Oral Heritage, interviews to elderly people about their experiences. - homepage: https://ahotsak.eus/ - validated:...

data catalog
need custodian permission
need data sourcing feedback