label-studio icon indicating copy to clipboard operation
label-studio copied to clipboard

Error when importing CSV with Jetty

Open FancyBanana opened this issue 2 years ago • 3 comments

Describe the bug When my spring-boot application tries to POST a csv file to /api/projects/{id}/import/ I get following error:

{"id":"f004a96d-aaab-4313-85c8-0d8de6864dea","status_code":400,"version":"1.4.1post1","detail":"Validation error","exc_info":null,"validation_errors":{"non_field_errors":["load_tasks: No data found in DATA or in FILES"]}}

To Reproduce Steps to reproduce the behavior:

  • Request:
COMM REQUEST LOG
POST http://localhost:3000/api/projects/3/import
Headers:
Accept-Encoding: gzip
User-Agent: Jetty/9.4.45.v20220203
Accept: application/json
Content-Type: multipart/form-data;boundary=J3uPyGWi69NLJPEXwEIKTBbLe6iu-tZyQe3rp
Authorization: Token 0a80fe412bca294bb0663105670283cee38f4961
Host: localhost:3000

********** content start **********
--J3uPyGWi69NLJPEXwEIKTBbLe6iu-tZyQe3rp
Content-Disposition: form-data; name="file"; filename="data.csv"
Content-Type: application/octet-stream
Content-Length: 1644

"Id métier","Id entrant","MIME Type","Path"
"1","0","text/plain","élément 1 : alten sud-ouest
élément 2 : alten sud ouest"
"2","1","text/plain","élément 1 : airbus operations s.a.s
élément 2 : airbus operations sas"
"3","2","text/plain","élément 1 : airbus eydt
élément 2 : airbus - eydt"
"4","3","text/plain","élément 1 : psa peugeot citroen
élément 2 : psa peugeot-citroen"
"5","4","text/plain","élément 1 : psa peugeot citroen
élément 2 : psa - peugeot citroen"
"6","5","text/plain","élément 1 : psa peugeot citroen
élément 2 : p.s.a. peugeot citroen"
"7","6","text/plain","élément 1 : dassault aviation
élément 2 : dassault-aviation"
"8","7","text/plain","élément 1 : cea leti
élément 2 : cea-leti"


*********** content end ***********
  • Response:
COMM RESPONSE LOG
Status HTTP/1.1 400 Bad Request
Headers: 
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET,POST,PUT,PATCH,DELETE,HEAD,OPTIONS
Access-Control-Allow-Headers: Content-Type, Origin, Accept, Authorization, Content-Length, X-Requested-With
Date: Fri, 03 Jun 2022 09:33:33 GMT
Server: WSGIServer/0.2 CPython/3.8.10
Content-Type: application/json
Allow: POST, OPTIONS
Content-Length: 221
x-frame-options: DENY
Vary: Accept-Language, Cookie, Origin
Content-Language: en-us
x-content-type-options: nosniff
referrer-policy: same-origin
Set-Cookie: sessionid=eyJ1aWQiOiI1ZTg1ODE4MS05NWU0LTQzYTItOGRlOC1iNDU5Y2RmZjk1OTkiLCJvcmdhbml6YXRpb25fcGsiOjF9:1nx3gT:nZaWF1Ppv0a6bMqjkvUt_ipV-ZLSEs-n5vvS721R2BM; expires=Fri, 17 Jun 2022 09:33:33 GMT; HttpOnly; Max-Age=1209600; Path=/; SameSite=Lax
Connection: keep-alive

********** content start **********
{"id":"f004a96d-aaab-4313-85c8-0d8de6864dea","status_code":400,"version":"1.4.1post1","detail":"Validation error","exc_info":null,"validation_errors":{"non_field_errors":["load_tasks: No data found in DATA or in FILES"]}}
*********** content end ***********

Expected behavior File gets imported as tasks. "POST /api/projects/3/import HTTP/1.1" 201 229

Screenshots Diff between Failed Jetty request body (left) and working Curl request body (right) image

Curl request: image

Jetty Request: image

Curl Request: image

Environment (please complete the following information):

  • OS: Windows Host, label-studio running in Docker container in WSL2
  • Label Studio Version: 1.4.1post1
{
  "release": "1.4.1post1",
  "label-studio-os-package": {
    "version": "1.4.1post1",
    "short_version": "1.4",
    "latest_version_from_pypi": "1.4.1.post1",
    "latest_version_upload_time": "2022-02-12T00:44:06",
    "current_version_is_outdated": false
  },

  "label-studio-os-backend": {
    "message": "Merge Develop + LSE hotfix/2.2.7-hotfix.1: Return 404 for api/project/ ...",
    "commit": "3239a3d04e65c2cd0091568aa0439d103be2970c",
    "date": "2022-05-11 16:05:40 +0300",
    "branch": "master",
    "version": "3239a3d"
  },

  "label-studio-frontend": {
    "message": "fix: DEV-2100: Fix preselected choices (#584)  - working with empty an ...",
    "commit": "ee38e771760e1ce57ac62dc1556ddd0718f62487",
    "branch": "master",
    "date": "2022-04-27T11:55:50Z"
  },

  "dm2": {
    "message": "Fix tasks selection (#44)",
    "commit": "97e33ac0a9b0ea09398b00d7916671ca76cf2a71",
    "branch": "master",
    "date": "2022-04-13T13:55:17Z"
  },

  "label-studio-converter": {
    "version": "0.0.40"
  }
}

Additional Context

Lable Studio console output from docker container:

[2022-06-03 09:33:33,002] [core.utils.common::custom_exception_handler::82] [ERROR] f004a96d-aaab-4313-85c8-0d8de6864dea [ErrorDetail(string='load_tasks: No data found in DATA or in FILES', code='invalid')]

Traceback (most recent call last):

  File "/usr/local/lib/python3.8/dist-packages/rest_framework/views.py", line 506, in dispatch

    response = handler(request, *args, **kwargs)

  File "/usr/local/lib/python3.8/dist-packages/django/utils/decorators.py", line 43, in _wrapper

    return bound_method(*args, **kwargs)

  File "/label-studio/label_studio/data_import/api.py", line 176, in post

    return super(ImportAPI, self).post(*args, **kwargs)

  File "/usr/local/lib/python3.8/dist-packages/rest_framework/generics.py", line 190, in post

    return self.create(request, *args, **kwargs)

  File "/label-studio/label_studio/data_import/api.py", line 208, in create

    parsed_data, file_upload_ids, could_be_tasks_lists, found_formats, data_columns = load_tasks(request, project)

  File "/label-studio/label_studio/data_import/uploader.py", line 148, in load_tasks

    raise ValidationError('load_tasks: No data found in DATA or in FILES')

rest_framework.exceptions.ValidationError: [ErrorDetail(string='load_tasks: No data found in DATA or in FILES', code='invalid')]

[2022-06-03 09:33:33,009] [django.request::log_response::224] [WARNING] Bad Request: /api/projects/3/import

[03/Jun/2022 09:33:33] "POST /api/projects/3/import HTTP/1.1" 400 221

FancyBanana avatar Jun 03 '22 09:06 FancyBanana

After simulating the request through Insomnia, I narrowed the error down to the Transfer-encoding: chunked header. If the header is set server will always respond with an error

FancyBanana avatar Jun 03 '22 13:06 FancyBanana

Thanks for the report!

After simulating the request through Insomnia, I narrowed the error down to the Transfer-encoding: chunked header. If the header is set server will always respond with an error

I'm not familiar with Jetty, can you control headers and just exclude this one from request as a workaround?

triklozoid avatar Jun 08 '22 10:06 triklozoid

Transfer-encoding is set automatically by WebClient class from Spring Framework, and as far as I know there's isn't any way to remove it other than setting Content-Length header manually, but that would require to manually serialize multipart form before sending the request. My workaround was to regenerate the OpenAPI Client using RestTemplate instead of WebClient.

FancyBanana avatar Jun 13 '22 07:06 FancyBanana