sddi-ckan-k8s icon indicating copy to clipboard operation
sddi-ckan-k8s copied to clipboard

Replace DataPusher

Open BWibo opened this issue 3 months ago • 0 comments

DataPusher should be replaced

CKAN DataPusher is not a good choice for pushing data into CKAN datastore. One core reason to replace DataPusher is that it is complicated to setup and extremly slow. Some more arguments are listed here. I identified two candidates to replace DataPusher.

ckanext-xloader

Pros

  • Comes as a CKAN extension and is easy to setup
  • Up to 10x faster than Datapusher

Cons

  • Needs to be included in the CKAN-SDDI image
  • Can only be autoscaled by scaling CKAN instances
  • All columns defined as text, and the Data Publisher will need to manually change the data types in the Data Dictionary and reload the data again.

DataPusher+

Pros

  • Built on qsv, an ultra fast processing tool wirtten in Rust.
  • Lives in a separate container and can be scaled individually

Cons

  • Complicated setup

BWibo avatar Mar 12 '24 18:03 BWibo