datapackage-pipelines icon indicating copy to clipboard operation
datapackage-pipelines copied to clipboard

Rename this library

Open pwalsh opened this issue 7 years ago • 14 comments

Context

Data Packages, and Frictionless Data specifications, are essentially part of the protocol and inner workings of this package, but the package itself does not really require knowledge of these specs. Branding it as "Data Packages" is misleading, and potentially would confuse users into thinking that a knowledge of data packages is required to use this package.

I think we should rename the package to simply pipelines or pipeline.

What are your thoughts @akariv

Any opinion on this @brew @roll @danfowler @vitorbaptista @amercader @rufuspollock ?

pwalsh avatar Jun 27 '17 13:06 pwalsh

Is it recognizable enough as a pipeline(s)? Because it's just kinda a common word. Maybe data-pipeline(s)

roll avatar Jun 27 '17 14:06 roll

I found I needed a passing understanding of data packages to understand how to write pipelines and processors.

brew avatar Jun 27 '17 15:06 brew

I agree that pipeline(s) is probably too generic.

  • packaging-pipeline(s)
  • out there suggestion: conveyor-belt (conveyor belts can move packages from A to B)

danfowler avatar Jun 28 '17 04:06 danfowler

Thinking it over, while it is true that @brew (and anyone else who wants to write their own processors) needs to know how Data Packages work in order to work with this tool, the most common user should be someone who (1) knows where her data is, (2) what it looks like, (3) where she wants to put it, and (4) how to write YAML. Thinking back to the first sentence of the README:

datapackage-pipelines is a framework for declarative stream-processing of tabular data. It is built upon the concepts and tooling of the Frictionless Data project.

Maybe the name should reference the most important type of data it ingests and spews (tables): table-pipelines, tabular-pipelines, tabular-data-pipelines, table-factory, table-streams, tabulator-streams (probably worth emphasizing in the README the connection to tabulator)...

danfowler avatar Jul 04 '17 03:07 danfowler

I've thought more and more and more like the factory concept (pipelines just suggesting moving water around - not processing it). Conveyor belts may be a bit passive but they have the factory sense. Maybe more of an assembly line ...

rufuspollock avatar Jul 05 '17 17:07 rufuspollock

But pipeline is an established concept in programming

roll avatar Jul 06 '17 08:07 roll

@roll good point i.e. unix pipes. However this is a bit more heavy duty than classic pipes but i think you're right.

rufuspollock avatar Jul 06 '17 16:07 rufuspollock

@akariv @rufuspollock @jobarratt

I've come to think this is a crucial step to take, sooner rather than later.

Candidates:

  • Data Workflows: CLI dw
  • Data Pipelines: CLI dp
  • Data Factory: CLI df
  • Data Flows: CLI df
  • Others?

Any of these will be better than the current name, and all address the above concerns that excluding "Data" (e.g.: pipelines) is confusing.

I'm happy to take a decision if needed, but I prefer to have @akariv take the call on this if he desires, as the author of the framework :).

pwalsh avatar Sep 05 '17 05:09 pwalsh

Name for plugins are superlong (cc @brew). I was against pipelines because it's not specific enough in my mind. But it there other one word alternatives? datapipe(s)?

PS. E.g. datapackage-pipelines-sourcespec-registry

roll avatar Sep 05 '17 05:09 roll

I added another option to https://github.com/frictionlessdata/datapackage-pipelines/issues/69#issuecomment-327076707 after chatting with @rufuspollock yesterday: shortening "Data Workflows" to "Data Flows"

pwalsh avatar Sep 06 '17 05:09 pwalsh

I also had in mind dataflow(s) as something short enough: dataflow-aws, dataflow-goodtables etc

roll avatar Sep 06 '17 06:09 roll

i like Data Flows or Data Pipelines. Pipelines marginally more because it's more reminiscent of infrastructure big

jobarratt avatar Sep 06 '17 07:09 jobarratt

DPP isn't just moving data from one point to another, but also transforming, changing and filtering it. Not sure how that helps, but perhaps it's more of an assembly line, than a pipeline? Having said that, I like Data Flows and Data Pipelines. If I have to describe the package to someone, I say it's a data pipeline framework.

edit: Ha! @rufuspollock said exactly this about assembly lines already.

brew avatar Sep 06 '17 11:09 brew

@brew if you like assembly lines we're probably a factory and our current pipelines are some combination of machines connected by conveyors ;-)

rufuspollock avatar Sep 06 '17 20:09 rufuspollock