label-studio icon indicating copy to clipboard operation
label-studio copied to clipboard

Persistent Storage on GCS not working on Imports

Open nhabbash opened this issue 1 year ago • 8 comments

Describe the bug After successfully importing an image through the UI (Import button), LS is unable to fetch the image from GCS in the task view.

  1. Importing through UI (I guess API would have the same result) Screenshot from 2023-06-05 16-04-40
  2. Task appears in the Project's list Screenshot from 2023-06-05 16-04-50
  3. Issue loading URL - image won't load from GCS to UI Screenshot from 2023-06-05 16-05-02
  4. Double checking on GCS - the image was successfully uploaded to the right bucket Screenshot from 2023-06-05 16-06-28
  5. Src and Dst Storage from Settings Screenshot from 2023-06-05 16-03-52 Screenshot from 2023-06-05 16-03-35

To Reproduce Reproduced this on Google Cloud Run and also locally. Local steps are:

  1. Clone repository
  2. Create a Service Account on GCP with the following roles and save service-account-file.json locally (these were the ones I needed for GCR, might reproduce it with just Storage Object Admin):
Cloud SQL Client
Logs Writer
Secret Manager Secret Accessor
Service Account Token Creator
Storage Object Admin
Storage Object and Bucket Viewer 
  1. Create a .env file under the root with the following:
STORAGE_TYPE=gcs
STORAGE_GCS_BUCKET_NAME="bucket_name"
STORAGE_GCS_PROJECT_ID="project_name"
STORAGE_GCS_FOLDER=""
GOOGLE_APPLICATION_CREDENTIALS="/opt/heartex/secrets/key.json"
  1. Change docker-compose.yml references of heartexlabs/label-studio:latest to heartexlabs/label-studio:1.7.3 (testing on this version)
  2. Under the app: definition, add:
app:
   ...
    env_file: .env
    volumes:
      - ./mydata:/label-studio/data:rw
      - ./service-account-file.json:/opt/heartex/secrets/key.json:ro
  1. Run docker-compose up and try to import an image

Expected behavior Imports working with GCS as a backend

Environment (please complete the following information):

  • OS: Ubuntu 22.04.2 LTS
  • Label Studio Version: 1.7.3

Additional context Also made a thread on Slack here with some additional context (ie found the request that was failing in Label Studio, but didn't find a solution)

nhabbash avatar Jun 05 '23 14:06 nhabbash

I've got the same problem on text Datasets The json file was located in the root of the bucket, I wonder why it's didn't get imported correctly

image

HRNPH avatar Jun 12 '23 02:06 HRNPH

Hello nhabbash and HRNPH,

Is it possible you can try 1.8.0 version of LS and also update your GCS CORS policy. We did have update for that

AbubakarSaad avatar Jun 24 '23 21:06 AbubakarSaad

I Fixed this by changing the prefix change the all file treated to false and use the correct regex for my own file (.json)

my bucket structure is as follow

- bucket_name
-- File1.json
-- File2.json

image

HRNPH avatar Jun 25 '23 10:06 HRNPH

Back on this, I tried with 1.8.0 and no issue with CORS. The app, when importing in a project with a Source Storage connection does upload the task (in my case, an image) to the bucket. The issue is that afterwards it's looking at the wrong path to find the image. The image is uploaded under gs://<bucket_name>/uploads/4/c15c56e0-meme.jpeg, but then the task errors out when loading the image:

There was an issue loading URL from $image value

Things to look out for:

URL is valid
URL scheme matches the service scheme, i.e. https and https
The static server has wide-open CORS, [more on that here](https://labelstud.io/guide/storage.html#Troubleshoot-CORS-and-access-problems)
Technical description:
URL: https://<ls_url>/storage-data/uploaded/?filepath=upload/4/c15c56e0-meme.jpeg

so this is the same I was getting when I opened the ticket, not much changed

nhabbash avatar Sep 26 '23 10:09 nhabbash

I am having the exact same issue. I set up persistent storage, but when i go to see the image to annotate it, nothing works.

dtock89 avatar Oct 19 '23 06:10 dtock89

I had a similar issue when setting up persistent storage with AWS S3. Exactly as described in the issue the file is successfully being uploaded to the persistent storage when using the UI, but the file is not shown afterwards in Label Studio. The task description says that the data is stored somewhere under /storage-data/uploaded/?filepath=. To actually resolve the data path to the correct path within the persistant storage I needed to change the logic in here to use the super().url() function instead of returning f'{settings.HOSTNAME}/storage-data/uploaded/?filepath={name}: image

The easiest way to workaround this issue was to simply set the global feature flag ff_back_dev_2915_storage_nginx_proxy_26092022_short to false. There is probably a cleaner way of fixing it by creating your own custom storage class (e.g. for AWS S3 CustomS3Boto3Storage) that correctly handles the url creation and referencing it here DEFAULT_FILE_STORAGE = 'core.storage.CustomS3Boto3Storage'

ippen avatar Jan 21 '24 15:01 ippen

Im facing the exact problem with 1.12.1

CristianCristanchoT avatar Jun 01 '24 00:06 CristianCristanchoT

I was able to resolve this with @ippen's suggestion to disable ff_back_dev_2915_storage_nginx_proxy_26092022_short (thanks!).

sam-simprints avatar Jun 11 '24 13:06 sam-simprints