adaptdl icon indicating copy to clipboard operation
adaptdl copied to clipboard

[Pollux, Reproducibility, Inquiry] Are dataset-fetching mechanisms broken?

Open stet-stet opened this issue 3 years ago • 3 comments

Hi, I am trying to run the pollux benchmark with custom workload and a different cluster (one that is not aws), to evaluate how pollux does in a variety of situations. However, I cannot seem to pull from your docker registry at registry.petuum.com, which is needed to assemble the containers for each of the six models. (See this directory, for example )

Below is a part of what kubectl describe pods outputs for the dataset pod, after I successfully launch the three kinds of sched pods.

Events:
  Type     Reason     Age                 From               Message
  ----     ------     ----                ----               -------
  Normal   Scheduled  2m2s                default-scheduler  Successfully assigned default/datasets-jxz86 to elsa-05
  Normal   Pulling    53s (x3 over 2m)    kubelet            Pulling image "registry.petuum.com/dev/esper-datasets:latest"
  Warning  Failed     38s (x3 over 104s)  kubelet            Failed to pull image "registry.petuum.com/dev/esper-datasets:latest": rpc error: code = Unknown desc = Error response from daemon: Get https://registry.petuum.com/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
  Warning  Failed     38s (x3 over 104s)  kubelet            Error: ErrImagePull
  Normal   BackOff    9s (x4 over 104s)   kubelet            Back-off pulling image "registry.petuum.com/dev/esper-datasets:latest"
  Warning  Failed     9s (x4 over 104s)   kubelet            Error: ImagePullBackOff

I tried just pulling an image as well, and I got what you can see below. I am starting to think that maybe some undocumented procedure(eg. registration) is required to access registry.petuum.com...?

> ping registry.petuum.com
PING ec2-54-245-165-47.us-west-2.compute.amazonaws.com (54.245.165.47) 56(84) bytes of data.

^C
> sudo docker pull registry.petuum.com/dev/esper-datasets:latest

Error response from daemon: Get https://registry.petuum.com/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

I googled a bit, and tested some of the more common solutions:

Regrettably, the former did not work, and it turns out the latter is not an option given my circumstances.

How can I proceed if I want to pull images from your server, and/or download the datasets you used in the evaluations in the paper?

Thank you in advance!

stet-stet avatar Jan 17 '22 16:01 stet-stet

sudo docker pull registry.petuum.com/dev/esper-datasets:latest

Hi @stet-stet , I am encountering a similar problem. Have you solved it?

gudiandian avatar May 25 '22 09:05 gudiandian

No, unfortunately...

stet-stet avatar May 25 '22 10:05 stet-stet

Hi, unfortunately we're not able to host the datasets for public access due to cost reasons and (for certain datasets like ImageNet) license reasons. However, all the datasets we used are public ones with citations provided in the Pollux paper. You should be able to obtain the datasets to reproduce the experiments.

aurickq avatar May 25 '22 19:05 aurickq