label-studio icon indicating copy to clipboard operation
label-studio copied to clipboard

Local File Serving Not Working in Docker Container Despite Correct Environment Variables

Open bsc001 opened this issue 1 year ago • 3 comments

Describe the bug I'm running Label Studio inside a Docker container using docker-compose. I've set up environment variables to access data from local files (linked to a volume). The files exist when checking within the container, but I cannot access them through URLs from the browser or within the container.

To Reproduce Steps to reproduce the behavior:

  1. Create a docker-compose.yml with the following Label Studio service configuration:
labelstudio:
  image: heartexlabs/label-studio:latest
  ports:
    - "4999:8000"
  depends_on:
    - database
  environment:
    - LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true
    - LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=/label-studio/data
    - LABEL_STUDIO_BASE_DATA_DIR=/label-studio/data/
    - LABEL_STUDIO_CORS_ORIGIN=*
    - LOG_LEVEL=DEBUG
  volumes:
    - label_studio_mydata:/label-studio/data:rw
    - documents_dataset:/label-studio/data/raw_datasets/documents_dataset:rw
  command: label-studio-uwsgi
  1. Start the Label Studio container.

  2. Inside the container, create a test file:

echo "this is fake image" > /label-studio/data/raw_datasets/documents_dataset/document1/document1_Page_01.jpg 2.Attempt to access the file via browser: http://localhost:4999/data/local-files/?d=raw_datasets/documents_dataset/document1/document1_Page_01.jpg 2.Attempt to access the file from within the container: Copy curl -v 'http://localhost:8000/data/local-files/?d=raw_datasets/documents_dataset/document1/document1_Page_01.jpg'

Expected behavior The file should be accessible via the provided URLs. Actual behavior

Browser access fails Curl command from within the container fails to retrieve the file

Environment information:

Label Studio Version: Label Studio version: 1.12.1

Additional context

Environment variables are correctly set within the container: LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=/label-studio/data LABEL_STUDIO_BASE_DATA_DIR=/label-studio/data/

The files are present and accessible within the container when checked directly. Unable to find Label Studio configuration file (/label-studio/data/label_studio_config.json) within the container. Unable to locate or access Label Studio log files within the container.

Attempted troubleshooting:

Verified file permissions within the container Checked environment variables Attempted to access a simple text file using curl within the container (failed)

Any assistance in resolving this issue would be greatly appreciated.

bsc001 avatar Jul 12 '24 06:07 bsc001

Issue Summary and Temporary Solution, issue not well understood

Update:

The issue originates from the function localfiles_data in label_studio\core\views.py.

Original Code:

def localfiles_data(request):
    """Serving files for LocalFilesImportStorage"""
    user = request.user
    path = request.GET.get('d')
    if settings.LOCAL_FILES_SERVING_ENABLED is False:
        return HttpResponseForbidden(
            "Serving local files can be dangerous, so it's disabled by default. "
            'You can enable it with LOCAL_FILES_SERVING_ENABLED environment variable, '
            'please check docs: https://labelstud.io/guide/storage.html#Local-storage'
        )
    local_serving_document_root = settings.LOCAL_FILES_DOCUMENT_ROOT
    if path and request.user.is_authenticated:
        path = posixpath.normpath(path).lstrip('/')
        full_path = Path(safe_join(local_serving_document_root, path))
        user_has_permissions = False

        # Try to find Local File Storage connection based prefix:
        # storage.path=/home/user, full_path=/home/user/a/b/c/1.jpg =>
        # full_path.startswith(path) => True
        
        localfiles_storage = LocalFilesImportStorage.objects.annotate(
            _full_path=Value(os.path.dirname(full_path), output_field=CharField())
        ).filter(_full_path__startswith=F('path'))

        if localfiles_storage.exists():
            user_has_permissions = any(storage.project.has_permission(user) for storage in localfiles_storage)

        if user_has_permissions and os.path.exists(full_path):
            content_type, encoding = mimetypes.guess_type(str(full_path))
            content_type = content_type or 'application/octet-stream'
            return RangedFileResponse(request, open(full_path, mode='rb'), content_type)
        else:
            return HttpResponseNotFound()

    return HttpResponseForbidden()

Problem:

The localfiles_storage.exists() method returns False on the remote host, which causes user_has_permissions to remain False. Consequently, the function returns a 404 response. This behavior differs from the local environment where localfiles_storage.exists() returns True.

Resolution:

As a temporary solution, I set the default value of user_has_permissions to True. This allows the function to check for the file and send it back correctly.

Updated Code:

def localfiles_data(request):
    """Serving files for LocalFilesImportStorage"""
    user = request.user
    path = request.GET.get('d')
    if settings.LOCAL_FILES_SERVING_ENABLED is False:
        return HttpResponseForbidden(
            "Serving local files can be dangerous, so it's disabled by default. "
            'You can enable it with LOCAL_FILES_SERVING_ENABLED environment variable, '
            'please check docs: https://labelstud.io/guide/storage.html#Local-storage'
        )
    local_serving_document_root = settings.LOCAL_FILES_DOCUMENT_ROOT
    if path and request.user.is_authenticated:
        path = posixpath.normpath(path).lstrip('/')
        full_path = Path(safe_join(local_serving_document_root, path))
        user_has_permissions = True  # Temporary solution

        # Try to find Local File Storage connection based prefix:
        localfiles_storage = LocalFilesImportStorage.objects.annotate(
            _full_path=Value(os.path.dirname(full_path), output_field=CharField())
        ).filter(_full_path__startswith=F('path'))

        if localfiles_storage.exists():
            user_has_permissions = any(storage.project.has_permission(user) for storage in localfiles_storage)

        if user_has_permissions and os.path.exists(full_path):
            content_type, encoding = mimetypes.guess_type(str(full_path))
            content_type = content_type or 'application/octet-stream'
            return RangedFileResponse(request, open(full_path, mode='rb'), content_type)
        else:
            return HttpResponseNotFound()

    return HttpResponseForbidden()

Notes:

  • This change ensures that user_has_permissions defaults to True, allowing the function to proceed with file serving if the file exists.
  • This is a temporary fix. The underlying issue causing localfiles_storage.exists() to return False on the remote host should be further investigated. Potential causes might include differences in database initialization, file paths, or environment configurations between local and remote environments.

bsc001 avatar Jul 12 '24 14:07 bsc001

Hi @bsc001 - did you connect an import storage of the Local Files type to a project you're working on, as shown in the screenshot below? That connection is what will make localfiles_storage.exists() return True; really it's just looking for a LocalFilesImportStorage that your user has access to, and with the appropriate path field on the local storage object.

Screenshot from 2024-08-01 00-18-37

jombooth avatar Aug 01 '24 04:08 jombooth

Yes i attached a folder there, but why it is not returing the list of files there ? ..

bsc001 avatar Aug 02 '24 08:08 bsc001