data-measurements-tool
data-measurements-tool copied to clipboard
Developing tools to automatically analyze datasets
Hello fellow maintainer, I have added some standardized markdowns to the README.md to improve the docs. Also, I fixed the unnecessary **`** signs in the document. Please review and approve.
I am looking for a library that can help measuring the dataset quality. This project is very useful for me. But I find that the latest commit is submitted 5...
Hi, I am trying to use _HuggingFaceM4/OBELICS_ with the data-measurements-tool. The dataset is loaded but due to its huge size it (approx 378 GB), I am unable to get results....
Hello! I'm a Cybersecurity researcher developing Packj [1]. Our tool has detected a supply-chain vulnerability in this repository. In order for me to disclose it, kindly enable GitHub Private vulnerability...
[EDIT: May 12] Forgot I had done this.
The current nPMI class needs to be refactored to become a generic "associations" module, which exposes nPMI along with other association measurements.
When I run ``` python3 run_data_measurements.py --dataset="hate_speech_offensive" --config="default" --split="train" --label_field="label" --feature="tweet" ``` the `dset_peek.json` file is not cached, which prevents me from running the UI in `live` mode Snapshot of...
Updated files: - app.py - dataset_util.py new files: - styles.css and index.html: stylisation code for streamlit's component
Apologies for the cached data🙈 the discussed way that we brought up had some additional problems with necessary files and updates that were part of the previous commit with the...
UI updates (see [figma](https://www.figma.com/file/xvBWSyzuURvtIaMKAjAPZY/Data-measurements-tool)): - sidebar - tabs instead of drop downs - colouring To:do's to come - fixing bold text - slight adjusting of colours