datapusher-plus icon indicating copy to clipboard operation
datapusher-plus copied to clipboard

Roadmap Tracking Issue - EPIC

Open jqnatividad opened this issue 2 years ago • 0 comments

OVERALL VISION: To increase the utility and performance of the CKAN Datastore:

  • by enriching resources, so that right after a file is pushed by DP+, it does a lot of data-wrangling tasks that are typically done manually:
    • a lot of metadata is inferred, so the Data Publisher does not have to laboriously enter it in
    • descriptive statistics are computed, allowing the Data Publisher and the end-user to better understand the resource
    • location information is automatically normalized and geocoded
    • related datasets/resources are automatically inferred
    • auto-tagging
  • by taking advantage of PostgreSQL native features
    • also use it as a Document Database leveraging JSONB?
    • partitioning/sharding?
  • by tapping into the rich PostgreSQL extensions ecosystem (in particular - PostGIS, Timescale, Citus, CartoDB, Apache Age and ZomboDB)
  • give it "Data Lake"-like capabilities
  • enable Datastore API users to issue performant, reliable SQL queries

  • [ ] #98
  • [ ] #18
  • [x] #11
  • [ ] Auto-tagging
  • [ ] Automatic spatial extent calculation
  • [ ] Automatic processing/recognition of whitelisted common column names (e.g. latitude, longitude, status, open date, closed date, etc.)
  • [x] #53
  • [x] #47
  • [x] #27
  • [x] #9
  • [ ] Auto partitioning
  • [ ] #60
  • [ ] Deferred datapush on initial package creation to allow per package Datapusher+ Configuration
  • [ ] #87
  • [x] #17
  • [ ] Enabling record-level search
  • [x] #8
  • [ ] #13
  • [ ] #54
  • [x] #10
  • [x] #19
  • [x] #30
  • [ ] Native PostGIS support
  • [ ] Native time-series support with Timescale
  • [ ] #34
  • [ ] #35
  • [x] #46

jqnatividad avatar Apr 27 '22 04:04 jqnatividad