peppy icon indicating copy to clipboard operation
peppy copied to clipboard

Referencing python package files as paths in PEP

Open afrendeiro opened this issue 4 years ago • 5 comments

I propose a new type of string substitution in a PEP that is capable of referencing files existing inside a Python package.

Take for example a PEP produced with geofetch. This is a package that has both a pipeline and interfaces for it. But (correct me if I'm wrong) in order for looper to use the amendment adding the interface, the user would need to additionally, clone the repository and either change the PEP manually to point to their newly cloned interface file, or set up the CODE environment variable:

...
project_modifiers:
  amend:
    sra_convert:
      looper:
        results_subdir: sra_convert_results
      sample_modifiers:
        append:
          SRR_files: SRA
          pipeline_interfaces: ${CODE}/geofetch/pipeline_interface_convert.yaml
...

The thing is, if the interface file is distributed with the pipeline, it would be great to do something like this to reference it (nomenclature/syntax to be discussed):

...
          pipeline_interfaces: {pypi:geofetch/pipeline_interface_convert.yaml}
...

The only thing the geofetch package has to change is add pipeline_interfec_convert.yaml to the Manifest.in. Now pipeline packages are pip installable and the user doesn't have to either change the PEP or point

Internally, this seems straightforward to implement too:

import pkg_resources
pkg_resources.resource_filename("geofetch", 'pipeline_interface_convert.yaml')

afrendeiro avatar May 27 '20 14:05 afrendeiro

I've experienced this inconvenience as well.. That's an interesting concept, but I have 2 questions:

  1. It seems like geofetch is one of a kind as pipelines go, and we do not distribute pipelines as software packages. These are kind of ever changing codebases. Are there any other scenarios that would benefit from this feature?

  2. PEP is a generic concept, currently implemented in Python and R, but possibly in more languages. It would not be trivial to implement a PyPi interface in every PEP implementation. Likewise, peppy would need to implement other ones too, e.g. to R package data? So we could not introduce this as a generic PEP feature, but rather an implementation-specific one? Is that desirable? Maybe looper would be a better place to implement a feature that would solve this kind of problems?

stolarczyk avatar May 27 '20 15:05 stolarczyk

You are right about geofetch and this is a good idea. I have also been thinking about something similar. I would like to make it so you could install geofetch, and you wouldn't also have to clone it to get the interface, as you point out.

As a side point -- pipeline interfaces is a looper key, so I think this would be handled by looper, not peppy. But anyway, the point still stands.

I think the python-focused seems reasonable, but we can do better and make it more generic, such that the PEP is not tightly coupled to python. For looper sections it's less relevant since looper is in python, but for generic PEPs there are other implementations.

Anyway, the alternative idea I had was that the pipeline_interfaces key could be a command template. Maybe it would be a new key, like pipeline_interfaces_command, which would be a shell command that returns a pipeline interface. Then, I'd write a new geofetch command, something like geofetch iface, which would print the interface to stdout. Now, I'd put in my PEP:

sample_modifiers:
  append:
    pipeline_interface_command: geofetch iface

This is a more universal solution because I can use any command -- so it doesn't restrict us to python.

Edit: Just saw @stolarczyk's comment. We are on the same lines. You are exactly right -- this would be a looper thing (at least as suggested). The generic solution I propose, though, is universal enough to be integrated into peppy. Still, I'd probably put it in looper.

nsheff avatar May 27 '20 15:05 nsheff

This would actually use almost the same functionality we already wrote for dynamic_variables_command_template

nsheff avatar May 27 '20 15:05 nsheff

It seems like geofetch is one of a kind as pipelines go, and we do not distribute pipelines as software packages. These are kind of ever changing codebases. Are there any other scenarios that would benefit from this feature?

@stolarczyk Maybe not currently, but initially the epigen/open_pipelines repo was pip installable, only later we switched in order for them to be cloned by the user.

Referencing language-specific packages and in the language-agnostic PEP is indeed not ideal,.

One option would be to let the PEP parser tool handle that. E.g. if looper finds a reference to an R package pipeline_interfaces: {bioc:XYZ/pipeline_interface.yaml} it warns looper cannot use bioc:XYZ interface because it's an R package. Hmm.. probably still not great.

A pipeline shell command that returns an interface sounds interesting though.

afrendeiro avatar May 27 '20 16:05 afrendeiro

A pipeline shell command that returns an interface sounds interesting though.

I don't see any disadvantages. it's exactly what you're suggesting, but more universal -- now it's up to the developer to produce a command that can return what you want -- but anything can do this. so it's much easier to code on the looper side, and much more flexible, as it can accommodate anything.

nsheff avatar May 27 '20 16:05 nsheff