dstack icon indicating copy to clipboard operation
dstack copied to clipboard

Do not fail if user-specified Docker image is non-root

Open jvstme opened this issue 3 months ago • 1 comments

Steps to reproduce

Try running a configuration with a non-root image.

> cat prometheus.dstack.yml 
type: task

image: bitnami/prometheus
ports:
  - 9090

resources:
  memory: 0.5GB..
  cpu: 1..
> dstack run . -f prometheus.dstack.yml

Actual behaviour

The run fails. CLI:

 Configuration          prometheus.dstack.yml 
 Project                main                  
 User                   admin                 
 Pool name              default-pool          
 Min resources          1..xCPU, 0.5GB..      
 Max price              -                     
 Max duration           72h                   
 Spot policy            auto                  
 Retry policy           no                    
 Creation policy        reuse-or-create       
 Termination policy     destroy-after-idle    
 Termination idle time  300s                  

 #  BACKEND  REGION          INSTANCE  RESOURCES                 SPOT  PRICE     
 1  aws      us-west-2       t2.small  1xCPU, 2GB, 100GB (disk)  yes   $0.004    
 2  aws      ap-southeast-1  t2.small  1xCPU, 2GB, 100GB (disk)  yes   $0.0062   
 3  aws      eu-central-1    t2.small  1xCPU, 2GB, 100GB (disk)  yes   $0.0068   
    ...                                                                          
 Shown 3 of 761 offers, $49.159 max

Continue? [y/n]: y
spotty-monkey-1 provisioning completed (failed)
Run failed with error code JobTerminationReason.INTERRUPTED_BY_NO_CAPACITY. Check CLI and server logs for more 
details.

Server logs:

ERROR 2024-04-04T11:41:36.084 dstack._internal.server.background.tasks.process_running_jobs The docker container of the job 'spotty-monkey-1-0-0' is not working: exit code: 127, error 
DEBUG 2024-04-04T11:41:36.085 dstack._internal.server.background.tasks.process_running_jobs runner healthcheck: {'state': 'pending', 'container_name': 'spotty-monkey-1-0-0', 'status': 'exited', 'running': False, 'oom_killed': False, 'dead': False, 'exit_code': 127, 'error': ''}

shim.log on the cloud instance:

Reading package lists...
E: List directory /var/lib/apt/lists/partial is missing. - Acquire (2: No such file or directory)
/bin/sh: 1: yum: not found

Expected behaviour

The configuration runs successfully.

dstack version

0.17.0

Server logs

No response

Additional information

The main error here is E: List directory /var/lib/apt/lists/partial is missing. - Acquire (2: No such file or directory). It happens because the bitnami/prometheus image is non-root. See https://stackoverflow.com/a/57930100 and https://docs.bitnami.com/tutorials/work-with-non-root-containers/

jvstme avatar Apr 04 '24 10:04 jvstme