galaxy icon indicating copy to clipboard operation
galaxy copied to clipboard

Password Parameter Implementation

Open katbeaulieu opened this issue 8 years ago • 32 comments

This pull request addresses the issue of third party authentication within Galaxy tools by providing a password parameter. Here is what has been done to handle some the security concerns regarding this parameter:

  • Removed the password parameter from the command line and passed it to the tool wrapper as an environment variable.
  • Set the password to a blank in the database once the environment variable has been set.

We have also ensured that tools with password parameters are able to be used in workflows by adding a parameter to indicate a password parameter is present ('JPCN...')

This has been tested with two tools I have created, repositories found here: https://github.com/AAFC-MBB/Galaxy/tree/dev/wrappers/irods_push https://github.com/AAFC-MBB/Galaxy/tree/dev/wrappers/irods_pull

There has been much talk about having this implemented and I believe this may be a simple solution for users looking to use Galaxy with third party software.

katbeaulieu avatar Nov 03 '16 14:11 katbeaulieu

@katbeaulieu please cleanup lib/galaxy/jobs/__init__.py.old.backup and lib/galaxy/jobs/__init__.py.old (preferably by squashing)

martenson avatar Nov 03 '16 14:11 martenson

If we want to integrate this feature, we will have to remove all password specific if-else blocks, such that all parameters are treated consistently again throughout the paramater handling.

guerler avatar Nov 03 '16 16:11 guerler

Ok so instead of if-else blocks, add functions which check all parameters for something relating to the password parameter, instead of doing checks inside of other existing functions, would this be a better solution?

katbeaulieu avatar Nov 03 '16 16:11 katbeaulieu

Its about special case handling for a particular parameter which we need to avoid. We have unified the process recently and removed a lot of custom handling and it is better to keep it that way i.e. all parameters should be treated in the same fashion. It is possible that we will be able to see how to do that once we moved forward since there are fortunately not a lot of special cases.

guerler avatar Nov 03 '16 16:11 guerler

Do you think we could think of a solution, potentially putting just one check at the end in command_factory.py? I really think this is something the rest of the community will benefit from. Do you have other examples of the kind of cleanup you did to make other parameters uniform in your removal of custom code? Maybe I could look into doing something similar.

katbeaulieu avatar Nov 03 '16 17:11 katbeaulieu

I think if we can break it down to one special case block it would be easier to discuss it if that is possible.

guerler avatar Nov 03 '16 18:11 guerler

Yes I think I can get something working. I will work on it tomorrow, and hopefully have it committed before the end of the day.

katbeaulieu avatar Nov 03 '16 20:11 katbeaulieu

I've added some changes to lib/galaxy/util/template.py to remove one of the conditionals, as well as removed the input_def.options and input_def.data from form_parameters.js as I realized they were not necessary. Please advise if this is acceptable.

katbeaulieu avatar Nov 04 '16 15:11 katbeaulieu

Sorry for the delay, I will look into it this week.

guerler avatar Nov 09 '16 17:11 guerler

Thank you for getting back to me Aysam.

katbeaulieu avatar Nov 09 '16 19:11 katbeaulieu

Ok, so the goal seems to be to submit a password which is then stripped from the command-line and/or the database and passed into the environment instead if I understand this right. Please correct me if I am wrong. I remember that there was another thread stating relarted concerns. Can you point me towards it? I am wondering how this effects reproducibility, the ability to re-run jobs when datasets have been shared and/or if this might pollute the environment. Finally you mentioned that it would be possible to reduce the changes to the command line factory handler but this does not seem to have happened yet, are there any unexpected challenges?

guerler avatar Nov 12 '16 16:11 guerler

Hi Aysam, O think the thread you are referring to is this one: https://github.com/galaxyproject/galaxy/pull/393. It mentions some of the security risks associated with having a password parameter but I think we have handled most of these. You are correct in your interpretation of the changes, it removes the password from the command line as well as the database and is passed in the environment. As for re-running jobs, the tool form will load up normally but the user will have to re-enter their password. As for your last point, yes, I had thought I would be able to reduce the changes to just command_factory but it seems that in lib/galaxy/tools/init.py, the identifying string gets dropped so I have to add back in to all_params. There is also the issue with workflows. I needed to make sure that the identifying string was a parameter associated with the tools with passwords which is why it had to be changed in tool-form-composite and the workflows.py in the api directory. I couldn't see a way around those issues.

katbeaulieu avatar Nov 12 '16 16:11 katbeaulieu

Can you took a look at: https://github.com/galaxyproject/galaxy/compare/dev...guerler:password? Its a work in progress of a slightly different implementation which seems to be less invasive. ping @jmchilton, @dannon. Comments are welcome. Thanks @dannon for your advise so far. If we want to follow this route, we could merge those changes into this PR.

guerler avatar Nov 14 '16 18:11 guerler

Hi Aysam, I took a look at your commits and it seems to work the same way as mine. I wish I had known I could check the type of the parameter application side as it would have saved me much work involving following an extra parameter around. I also like that you named the environment variable the same thing the user named it in their tool config, it makes it a lot easier.

katbeaulieu avatar Nov 15 '16 11:11 katbeaulieu

Hi Aysam, So should we move forward with your implementation? If yes I would have some changes to add to it myself, let me know how you would like to go forward with this:)

katbeaulieu avatar Nov 15 '16 14:11 katbeaulieu

Cool. Its up to you, if you want to you can merge these changes into your PR or close yours and we open another one, whichever you prefer.

guerler avatar Nov 15 '16 15:11 guerler

I will merge these changes into my PR, add the ones to obfuscate the password in one area we missed and we can review it all again.

katbeaulieu avatar Nov 15 '16 15:11 katbeaulieu

Ok just give me a few I will issue a PR against your branch.

guerler avatar Nov 15 '16 15:11 guerler

@guerler About the changes to the security module - I wish they were a bit different. As outlined here:

Can you handle the encoding a bit differently? There are a lot of places where that decode_id exception should actually be thrown. There is a kind flag on decode_id I'd feel more comfortable if you used as well - so that id encoding doesn't correspond directly to password encoding.

So maybe refactor the security piece to:

     def decode_id( self, obj_id, kind=None ):
         return int( self.decode( object_id, kind=kind ) )

    def decode(self, obj, kind=None ):
         id_cipher = self.__id_cipher( kind )
         return id_cipher.decrypt( obj_id.decode( 'hex' ) ).lstrip( "!" ) )

and use .decode(password, kind="tool_password") instead of .decode_id(password).

Still though after all of that - we are still...

  • Putting encoded values (with id_secret) in the database when we shouldn't.
  • We are not tying the password to the user in anyway - so sharing a history shares the password.
  • We are not preventing the password from being returned to the tool form, tool info, etc....

Or am I wrong about any of these three?

jmchilton avatar Nov 15 '16 16:11 jmchilton

@jmchilton, for your first point, the password is in the database as an encoded value only until the start of execution the tools. I think this is a risk we will have to weigh as to whether or not we want to accept. For your second point, I have not tested what happens when you share a history but will find a way so that the password does not get shared. For your third point, I think you are referring to when you click on the 'info' after tool execution and would see the value of the password. This shows up as a blank value. I'm not quite sure what you mean for the tool-form but in both the individual tool form and the workflow tool form, the password is obfuscated from the user. Let me know if you have any more concerns and I would be happy to work on them.

katbeaulieu avatar Nov 15 '16 16:11 katbeaulieu

I agree with @katbeaulieu and I'd be also willing to assist further of course. If you prefer a separate decode caller we can do that. I have issued a PR at https://github.com/AAFC-MBB/galaxy-1/pull/1 against this branch with the changes so far. Feel free to merge current dev first and then review and merge that PR. Thanks for looking into it @jmchilton.

guerler avatar Nov 15 '16 16:11 guerler

I want to throw in a total different idea which I promised month ago to @katbeaulieu! Sorry @katbeaulieu we have worked on https://github.com/galaxyproject/galaxy/pull/3118 to enhance the user-preference setting before.

Background

Tools with passwords, personalized URLs and all other types of credentials are kind of different and are a potential security issue. The only way we can pass passwords to tools via a command line is via ENV vars or via a file. ENV vars are also tricky if you capture the entire ENV and put into as metric into metadata.

This and some usability arguments like "I want to use this tool in a workflow" or "I don't want to type the password X times" was the reason I was aiming for a different approach, involving the admin - I belief the admin needs to be involved by these type of tools.

The idea

Create a config file in which the admin can specify key-value pairs, lists etc. These keys and values are available in the user-preferences for every user and every user can enter credentials/passwords or URLs for various Apollo instances, as requested by @erasche.

This store is a central password storage and we can choose if we want to put it into a database or somewhere else - but the values will end up in the $__user__ object which can be accessed by any tool that is able to run by a specific user.

A tool author can than choose to put the credentials into a file or into a ENV to pass it to the tool. This would enable entering a password only once and using it in workflows etc.

The "disadvantage" is that an admin needs to configure one config file to enable the user to put in one key-value pair which can be picked up by a tool. I personally see this as not a big deal as these tools are very special and admins should now about such cases.

An initial implementation is here: https://github.com/bgruening/galaxy/compare/remove_mako_user_pref...bgruening:extra_user_preferences?expand=1

bgruening avatar Nov 15 '16 17:11 bgruening

So the tool would be useless after going through the toolshed installation unless the admin makes additional adjustments to the user preferences? I think if we consider this route the toolshed should be enabled to add those preferences automatically. Additionally this PR already accesses the password through the environment so users could either provide the password on submission or store it in the user preferences. I am not sure if these two options are necessarily conflicting.

guerler avatar Nov 15 '16 17:11 guerler

@guerler yes useless, or not functional. I'm not sure these tool will end up in the TS, if so not as many I guess. My argument is that the admin should be aware of these tools to protect them, for example disable them for certain users etc ... also this approach would enable things like setting the password via job_conf.xml and so on. They are not conflicting I guess if the password field can be optional - but both approaches maybe confusing for a user and I'm not sure if we want to offer/support such an password field in the long run. My gut feeling is telling me that we have a lot more flexibility in the long run if we keep this out of tools.

bgruening avatar Nov 15 '16 18:11 bgruening

@guerler @bgruening just quick comments before I go to lunch

So the tool would be useless after going through the toolshed installation unless the admin makes additional adjustments to the user preferences?

In my perfect world, a tool declares the structure of the credentials it requires, and a namespace for said credentials. Upon first-run it is offered to the user to configure credentials for the tool which are stored in their user-prefs.

E.g. apollo tool would declare (apollo_url, username, password) and a basespace importer might declare (api_key).

I recognise that this goal may not line up with this PR, etc etc.

hexylena avatar Nov 15 '16 18:11 hexylena

@katbeaulieu please still merge the current changes into this PR, so we can review and discuss it on that basis, even if the consensus will be to follow a different route.

guerler avatar Nov 16 '16 16:11 guerler

Yes sorry for not doing it sooner!

katbeaulieu avatar Nov 16 '16 17:11 katbeaulieu

Just going back to test this again, I realized that this new version now does not support using the password parameter in code files. Would this be a simple fix? I can't seem to find the spot to change to bring back that functionality.

katbeaulieu avatar Nov 16 '16 18:11 katbeaulieu

@katbeaulieu I am not sure how that behavior might have changed, unless the password was not stripped from the database before and/or accessed through the to_param_dict_string call which now only returns some information and previously possibly returned the unencoded password.

guerler avatar Nov 16 '16 19:11 guerler

@guerler, after looking at this for a while, I realized that the code file was receiving the encrypted value of the password. If this gets merged, this will be something to note in the documentation.

katbeaulieu avatar Nov 21 '16 15:11 katbeaulieu