datapackage-pipelines
datapackage-pipelines copied to clipboard
Rename this library
Context
Data Packages, and Frictionless Data specifications, are essentially part of the protocol and inner workings of this package, but the package itself does not really require knowledge of these specs. Branding it as "Data Packages" is misleading, and potentially would confuse users into thinking that a knowledge of data packages is required to use this package.
I think we should rename the package to simply pipelines
or pipeline
.
What are your thoughts @akariv
Any opinion on this @brew @roll @danfowler @vitorbaptista @amercader @rufuspollock ?
Is it recognizable enough as a pipeline(s)
?
Because it's just kinda a common word. Maybe data-pipeline(s)
I found I needed a passing understanding of data packages to understand how to write pipelines and processors.
I agree that pipeline(s)
is probably too generic.
-
packaging-pipeline(s)
- out there suggestion:
conveyor-belt
(conveyor belts can move packages from A to B)
Thinking it over, while it is true that @brew (and anyone else who wants to write their own processors) needs to know how Data Packages work in order to work with this tool, the most common user should be someone who (1) knows where her data is, (2) what it looks like, (3) where she wants to put it, and (4) how to write YAML. Thinking back to the first sentence of the README:
datapackage-pipelines is a framework for declarative stream-processing of tabular data. It is built upon the concepts and tooling of the Frictionless Data project.
Maybe the name should reference the most important type of data it ingests and spews (tables): table-pipelines
, tabular-pipelines
, tabular-data-pipelines
, table-factory
, table-streams
, tabulator-streams
(probably worth emphasizing in the README the connection to tabulator)...
I've thought more and more and more like the factory concept (pipelines just suggesting moving water around - not processing it). Conveyor belts may be a bit passive but they have the factory sense. Maybe more of an assembly line ...
But pipeline
is an established concept in programming
@roll good point i.e. unix pipes. However this is a bit more heavy duty than classic pipes but i think you're right.
@akariv @rufuspollock @jobarratt
I've come to think this is a crucial step to take, sooner rather than later.
Candidates:
- Data Workflows: CLI
dw
- Data Pipelines: CLI
dp
- Data Factory: CLI
df
- Data Flows: CLI
df
- Others?
Any of these will be better than the current name, and all address the above concerns that excluding "Data" (e.g.: pipelines
) is confusing.
I'm happy to take a decision if needed, but I prefer to have @akariv take the call on this if he desires, as the author of the framework :).
Name for plugins are superlong (cc @brew). I was against pipelines
because it's not specific enough in my mind. But it there other one word alternatives? datapipe(s)
?
PS.
E.g. datapackage-pipelines-sourcespec-registry
I added another option to https://github.com/frictionlessdata/datapackage-pipelines/issues/69#issuecomment-327076707 after chatting with @rufuspollock yesterday: shortening "Data Workflows" to "Data Flows"
I also had in mind dataflow(s)
as something short enough: dataflow-aws
, dataflow-goodtables
etc
i like Data Flows or Data Pipelines. Pipelines marginally more because it's more reminiscent of infrastructure
DPP isn't just moving data from one point to another, but also transforming, changing and filtering it. Not sure how that helps, but perhaps it's more of an assembly line, than a pipeline? Having said that, I like Data Flows and Data Pipelines. If I have to describe the package to someone, I say it's a data pipeline framework.
edit: Ha! @rufuspollock said exactly this about assembly lines already.
@brew if you like assembly lines we're probably a factory and our current pipelines are some combination of machines connected by conveyors ;-)