datasets
datasets copied to clipboard
Drug Bank Clean-up
I'm very suspicious of this dataset, not least because I don't really understand it.
- Check that the values are in consistent units.
- Check the actual possible values of the groups.
- Can we refactor this to not have any lists?
Seriously, someone needs to check every field (and its documentation) and tell me what makes sense to do with this dataset. Cleaning it is one thing, but I'm starting to suspect that some of the fields are redundant (e.g., properties.logs, properties.water solubility).
Looking at this through the visualizer, it really hits home that this is an awful dataset. We probably need to look at some genomic data or genotype data. I'm way out of my depth here, but we really do need a solid biochem dataset.