sciwing
sciwing copied to clipboard
Cli for exploring different datasets
There are different classification datasets that are part of the repo now. A cli to explore the different datasets would be a nice feature to have. Get stats is already part of the interface.
The cli should
- Ask which dataset to explore
- The user should be able to see basic vocab stats of the dataset (Number of distinct words. Most popular words)
- Other information about the dataset, like number of training examples, validation examples, the max length of instances