datapusher-plus
datapusher-plus copied to clipboard
Roadmap Tracking Issue - EPIC
OVERALL VISION: To increase the utility and performance of the CKAN Datastore:
- by enriching resources, so that right after a file is pushed by DP+, it does a lot of data-wrangling tasks that are typically done manually:
- a lot of metadata is inferred, so the Data Publisher does not have to laboriously enter it in
- descriptive statistics are computed, allowing the Data Publisher and the end-user to better understand the resource
- location information is automatically normalized and geocoded
- related datasets/resources are automatically inferred
- auto-tagging
- by taking advantage of PostgreSQL native features
- also use it as a Document Database leveraging JSONB?
- partitioning/sharding?
- by tapping into the rich PostgreSQL extensions ecosystem (in particular - PostGIS, Timescale, Citus, CartoDB, Apache Age and ZomboDB)
- give it "Data Lake"-like capabilities
- enable Datastore API users to issue performant, reliable SQL queries
- [ ] #98
- [ ] #18
- [x] #11
- [ ] Auto-tagging
- [ ] Automatic spatial extent calculation
- [ ] Automatic processing/recognition of whitelisted common column names (e.g. latitude, longitude, status, open date, closed date, etc.)
- [x] #53
- [x] #47
- [x] #27
- [x] #9
- [ ] Auto partitioning
- [ ] #60
- [ ] Deferred datapush on initial package creation to allow per package Datapusher+ Configuration
- [ ] #87
- [x] #17
- [ ] Enabling record-level search
- [x] #8
- [ ] #13
- [ ] #54
- [x] #10
- [x] #19
- [x] #30
- [ ] Native PostGIS support
- [ ] Native time-series support with Timescale
- [ ] #34
- [ ] #35
- [x] #46