digital_slide_archive icon indicating copy to clipboard operation
digital_slide_archive copied to clipboard

GPU enabled taks

Open ds2268 opened this issue 2 years ago • 16 comments

I would like to have GPU enabled Slicer CLI tasks and would like to know how to pass additional flags for the task Docker container, as per official Docker guideline (or nvidia-docker), you need to specify either --gpus flag (Native Docker) or --runtime (nvidia-docker). I saw that there is CNNCellDetection example CLI which uses GPUs, but DSA part is not explained there, how to enable it. Can you elaborate on that? When I run a GPU-enabled task, I get from torch:

RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver

Additionally, I would also like to know how to pass environment variables to the CLI worker Docker container. I would like that on DSA setup, certain ENV variables are passed and also passed along to running tasks (e.g. cloud access credentials, API keys, etc.), which should not be hardcoded.

ds2268 avatar Feb 11 '22 07:02 ds2268

I think that the problem is in Girder Worker Docker container nvidia/GPU support check:

https://github.com/girder/girder_worker/blob/master/girder_worker/docker/tasks/init.py#L74 https://github.com/girder/girder_worker/blob/e635acedb50d78d48de227bad7cc2bf7376810cf/girder_worker/docker/nvidia.py#L1

It seems like you need to have LABEL com.nvidia.volumes.needed=nvidia_driver present in the image for the Girder Worker to add Nvidia runtime to the runtime. It seems also that Nvidia stopped using this label in recent images so nvidia runtime is then not assigned. I dont know how Nvidia now recognizes the same stuff, but the label is absent now.

I also saw that you added support in Girder Worker for new (19.03+) native Docker support for the GPUs, but according to the implementation (check the link below), you still need the above LABEL to be present, to even consider it. With the new native support, there is no need to have runtime="nvidia", just --gpus [all, or an ID] flag. I think that the current implementation for the GPU support in Girder Worker is thus not OK. Can you elaborate how should one use the new native GPU support (--gpus)? I don't think it's possible currently, despite the comments.

https://github.com/girder/girder_worker/blob/master/girder_worker/docker/tasks/init.py#L88

ds2268 avatar Feb 11 '22 20:02 ds2268

Adding the LABEL com.nvidia.volumes.needed=nvidia_driver should make the gpu device request get invoked, even if that label doesn't do any good otherwise. Alternately, you can pass device_requests as part of the job arguments. I think this is a list of dictionaries where each entry has values such as listed in the DeviceRequest class in https://github.com/docker/docker-py/blob/master/docker/types/containers.py#L155 .

manthey avatar Feb 11 '22 20:02 manthey

Yes, I will do that now -> add LABEL. You need to be careful then though, that you need nvidia-docker2 in order to have support for the runtime=nvidia flag. I think that GPU usage needs a bit more documentation and refactoring to be up-to-date :)

For the new API (--gpus), I will try, but you will still need to have LABEL present in your image, just to enter the IF section needed to run the container:

https://github.com/girder/girder_worker/blob/master/girder_worker/docker/tasks/init.py#L88

I will be using GPUs a lot together with Girder, so I am more than happy to test the things out when needed. We can keep this issue open until then.

ds2268 avatar Feb 11 '22 20:02 ds2268

We'd welcome improvements, either in the code or the docs.

I know that some others have run GPU code (for instance https://github.com/SarderLab/Histo-cloud), though I don't know the specifics of what they did to get GPU access.

manthey avatar Feb 11 '22 20:02 manthey

Thanks @manthey, I will do PR on docs, when I figure everything out. Thank you also for the Histo-cloud reference. They have solved it exactly in the same manner - by adding LABEL:

https://github.com/SarderLab/Histo-cloud/blob/main/Dockerfile#L11

ds2268 avatar Feb 11 '22 21:02 ds2268

@manthey do you have any suggestions for the 2nd part of the question, about how to pass "secrets" into CLI task container (e.g. API keys). Given that the task container is executed through Docker SDK. Good enough would be that certain ENV variables from the local env are passed to this executed CLI, or a file is mounted somehow on DSA deploy, which is also visible in Docker CLI taks? I saw that extra mounts are mounted in /opt/digital_slide_archive/mounts in girder_worker, 1 option would be to have such mount present also in slicer_cli_web task container?

One alternative (used also in Histo cloud) is to use slicer UI and default values, but I don't like this approach as it exposes API keys as hardcoded values or forces the users to always enter them. A cleaner approach is to have API keys stored as metadata for the task in Girder and then only provide Girder API key through UI, but still, I would like to avoid that maybe.

ds2268 avatar Feb 12 '22 09:02 ds2268

I'll have to think about how to do this generally. Currently, there is only the support to pass through a girderApiUrl and girderToken.

I think passing through local ENV is the way to go -- it would be the ENV in the girder_worker environment, and we'd have to add some code to pass that through. We could do this sort of like tox where there could a list of regex for env variables, and any matching variables would be passed through.

manthey avatar Feb 14 '22 14:02 manthey

I had a very similar idea. Currently, I resorted to using Girder Collections and Items key-value metada. Basically, I have added metadata key-value pairs to the task. I then only need to pass API URL and Girder API token through task UI and the task then gets ENV variables programmatically through API calls. This is the current solution, as I have plenty of ENV variables and need to expose only Girder credentials as such. This is still not optimal as I added default value to the token, for the pathologists not to copy always their API key.

I will test it out, when available and give you feedback! Thanks!

ds2268 avatar Feb 14 '22 16:02 ds2268

@manthey You mentioned that API URL (girderApiUrl) and Token (girderToken) are auto-populated and passed to the task. I saw this description also in Slicer CLI Web repo readme description. It doesn't work for me. The input boxes are empty - I have tried leaving out default part also.

<label>Girder API URL and Key</label>
<description>A Girder API URL and token for Girder client</description>
<string>
  <name>girderApiUrl</name>
  <longflag>api-url</longflag>
  <label>Girder API URL</label>
  <description>A Girder API URL (e.g., https://girder.example.com:443/api/v1)</description>
  <default></default>
</string>
<string>
  <name>girderToken</name>
  <longflag>token</longflag>
  <label>Girder API Token</label>
  <description>A Girder token</description>
  <default></default>
</string>

Update: I saw that they are not populated, but they are actually passed then? Given the code, I assume that API token is generated specifically for each job?

'http://girder:8080/api/v1/', '--token', 'epU2xRXcQlFXshUGcLu6c5Ss2Ul4G9pSyJpSySQagQrZJKVTCRfcxv1BlwVaLVuQ'

girder_client.HttpError: HTTP error 400: POST **URL**/api/v1/api_key/token?key=rwE15nnuGb1JbY28nvNG6iW9udQB5qobOR9ZNZZsw8GLPpaNLihX0he8ghuZAhrz Response text: {"message": "Invalid API key.", "type": "validation"}

The above auto-generated api URL and token are not working for me - I have deployed mine at different port and also if I specify API URL and leave for the token to be auto-generated, it says that the token is not valid.

Additional question regarding XML schema: Let's say that I want to pass an image AND an annotation file that was just before created on that image (i.e. user has before created a new ann file and drawn a polygon and then runs an analysis). How can I have UI where the user can select an annotation file (if there are multiple for this item) on which he wants to perform the analysis?

Alternative is to also have itemID that you are processing available in the task and then one can use API call to get all the annotation. Can I obtain itemID of the processed image in the task? I have currently extended slicer_cli_web repo to add girderItemID in a similar manner as token/api, as I saw that itemID is stored in a "reference" object in the prepare_task method. But this is an ugly solution.

ds2268 avatar Feb 15 '22 11:02 ds2268

@manthey any comments on the problem of:

  • token not being valid in the task?
  • passing current annotation file as the input along the image (I have currently extended web_slicer_cli to have auto-populated girder item ID that is being processed, but maybe not that nice of a solution)

Thanks!

ds2268 avatar Feb 17 '22 14:02 ds2268

The girderApiUrl and girderToken fields are auto-populated if left blank. You should be able to use them inside the cli like so (here there is another string input of itemId):

    gc = girder_client.GirderClient(apiUrl=args.girderApiUrl)
    gc.setToken(args.girderToken)

    annotations = gc.get(f'annotation', parameters=dict(limit=100, offset=0, itemId=args.imageId))
    pprint.pprint(annotations)

This, for instance, is using an xml spec that looks like

<?xml version="1.0" encoding="utf-8"?>
<executable>
  <category>Sample</category>
  <title>Girder Client - Fetch Annotation</title>
  <description>Summarize annotations associated with a whole slide image.</description>
  <version>0.1.0</version>
  <license>Apache 2.0</license>
  <contributor>David Manthey (Kitware)</contributor>
  <parameters>
    <description>General parameters</description>
    <label>IO</label>
    <string>
      <name>imageId</name>
      <index>0</index>
      <label>imageId</label>
      <description>An image ID</description>
    </string>
  </parameters>
  <parameters advanced="true">
    <label>Girder API URL and Key</label>
    <description>A Girder API URL and token for Girder client</description>
    <string>
      <name>girderApiUrl</name>
      <longflag>api-url</longflag>
      <label>Girder API URL</label>
      <description>A Girder API URL (e.g., https://girder.example.com:443/api/v1)</description>
      <default></default>
    </string>
    <string>
      <name>girderToken</name>
      <longflag>girder-token</longflag>
      <label>Girder Token</label>
      <description>A Girder token</description>
      <default></default>
    </string>
  </parameters>
</executable>

manthey avatar Feb 17 '22 15:02 manthey

There needs to be a convenient way to pass a Girder ID for an item, file, or other resource and/or a convenient way to pass a specific annotation or set of annotations as a json file. This would require some work in the slicer_cli_web library. One idea would be, to pass a resource ID rather than the file itself, we'd add a flag to the item/file/folder fields with a boolean of "as GirderID" or something similar. For annotations, we could easily pass ALL annotations (since there is no UI to pick which ones), or specify an annotation by name (pass any annotations whose name match a regex, for instance) or recentness (pass the last modified annotation), or a combination of these (the most recent annotation whose name matches a regex).

Feel free to create issues in the slicer_cli_web repo for these and comment on what you think would work well for your use case.

manthey avatar Feb 17 '22 15:02 manthey

Great @manthey the problem was probably that I was using:

from histomicstk.utils.girder_convenience_utils import connect_to_api

With girder_client it works now, though I have a new problem:

annotations = gc.get(f'annotation', parameters=dict(limit=100, offset=0, itemId=args.imageId))

returns:

girder_client.HttpError: HTTP error 401: GET http://girder:8080/api/v1/annotation?itemId=6188cb40c7173b56342476a2 Response text: {"message": "Read access denied for item 6188cb40c7173b56342476a2 (user None).", "type": "access"}

This probably means that auto-generated token does not have proper rights to get annotations properly. I guess that there is some bug in the REST call for the annotations, it should list annotations only with read rights only (I don't know what are the default permissions for auto-generated tokens). The auto-generated token works OK for the default bbox ROI though.

ds2268 avatar Feb 17 '22 19:02 ds2268

The above problem with regards to read access and sometimes also "you need to be logged in" is that majority of the API calls have @access.user and thus do not work solely with the token in the CLI task. I have resorted to adding @access.token to some of the @access.user API calls that I need from the CLI (e.g. listing CLIs api call requires users permissions, which I think is not needed and the token only should also be enabled.

TODO: @manthey maybe a revision is needed on the access rights for the API calls (e.g. listing CLIs etc.) that currently require user logged in. Some of them might be changed to @access.token or even @access.public.

ds2268 avatar Feb 21 '22 16:02 ds2268

I think you are right that some of the access needs to be changed to @access.token. Have you had to modify any access points in the core of Girder? Or only in large_image and histomicsui?

manthey avatar Feb 22 '22 15:02 manthey

I have only modified it in slicer_cli_web to be able to list tasks - the use case is for example that you store pre-trained models in the task as an additional file, or metadata and you need to access it then in the task. Probably there are more that would be useful to have under token availability only. I had some problems also with obtaining existing annotations for a particular slide, even if just the annotations from the same user. But don't know if those are on Girder.

ds2268 avatar Feb 22 '22 15:02 ds2268

This may have been taken care of in https://github.com/girder/large_image/pull/999. I'll close it; please reopen if it is still an issue.

manthey avatar Nov 22 '22 21:11 manthey