label-studio
label-studio copied to clipboard
Error when importing CSV with Jetty
Describe the bug
When my spring-boot application tries to POST
a csv file to /api/projects/{id}/import/
I get following error:
{"id":"f004a96d-aaab-4313-85c8-0d8de6864dea","status_code":400,"version":"1.4.1post1","detail":"Validation error","exc_info":null,"validation_errors":{"non_field_errors":["load_tasks: No data found in DATA or in FILES"]}}
To Reproduce Steps to reproduce the behavior:
- Request:
COMM REQUEST LOG
POST http://localhost:3000/api/projects/3/import
Headers:
Accept-Encoding: gzip
User-Agent: Jetty/9.4.45.v20220203
Accept: application/json
Content-Type: multipart/form-data;boundary=J3uPyGWi69NLJPEXwEIKTBbLe6iu-tZyQe3rp
Authorization: Token 0a80fe412bca294bb0663105670283cee38f4961
Host: localhost:3000
********** content start **********
--J3uPyGWi69NLJPEXwEIKTBbLe6iu-tZyQe3rp
Content-Disposition: form-data; name="file"; filename="data.csv"
Content-Type: application/octet-stream
Content-Length: 1644
"Id métier","Id entrant","MIME Type","Path"
"1","0","text/plain","élément 1 : alten sud-ouest
élément 2 : alten sud ouest"
"2","1","text/plain","élément 1 : airbus operations s.a.s
élément 2 : airbus operations sas"
"3","2","text/plain","élément 1 : airbus eydt
élément 2 : airbus - eydt"
"4","3","text/plain","élément 1 : psa peugeot citroen
élément 2 : psa peugeot-citroen"
"5","4","text/plain","élément 1 : psa peugeot citroen
élément 2 : psa - peugeot citroen"
"6","5","text/plain","élément 1 : psa peugeot citroen
élément 2 : p.s.a. peugeot citroen"
"7","6","text/plain","élément 1 : dassault aviation
élément 2 : dassault-aviation"
"8","7","text/plain","élément 1 : cea leti
élément 2 : cea-leti"
*********** content end ***********
- Response:
COMM RESPONSE LOG
Status HTTP/1.1 400 Bad Request
Headers:
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET,POST,PUT,PATCH,DELETE,HEAD,OPTIONS
Access-Control-Allow-Headers: Content-Type, Origin, Accept, Authorization, Content-Length, X-Requested-With
Date: Fri, 03 Jun 2022 09:33:33 GMT
Server: WSGIServer/0.2 CPython/3.8.10
Content-Type: application/json
Allow: POST, OPTIONS
Content-Length: 221
x-frame-options: DENY
Vary: Accept-Language, Cookie, Origin
Content-Language: en-us
x-content-type-options: nosniff
referrer-policy: same-origin
Set-Cookie: sessionid=eyJ1aWQiOiI1ZTg1ODE4MS05NWU0LTQzYTItOGRlOC1iNDU5Y2RmZjk1OTkiLCJvcmdhbml6YXRpb25fcGsiOjF9:1nx3gT:nZaWF1Ppv0a6bMqjkvUt_ipV-ZLSEs-n5vvS721R2BM; expires=Fri, 17 Jun 2022 09:33:33 GMT; HttpOnly; Max-Age=1209600; Path=/; SameSite=Lax
Connection: keep-alive
********** content start **********
{"id":"f004a96d-aaab-4313-85c8-0d8de6864dea","status_code":400,"version":"1.4.1post1","detail":"Validation error","exc_info":null,"validation_errors":{"non_field_errors":["load_tasks: No data found in DATA or in FILES"]}}
*********** content end ***********
Expected behavior
File gets imported as tasks.
"POST /api/projects/3/import HTTP/1.1" 201 229
Screenshots
Diff between Failed Jetty request body (left) and working Curl request body (right)
Curl request:
Jetty Request:
Curl Request:
Environment (please complete the following information):
- OS: Windows Host, label-studio running in Docker container in WSL2
- Label Studio Version: 1.4.1post1
{
"release": "1.4.1post1",
"label-studio-os-package": {
"version": "1.4.1post1",
"short_version": "1.4",
"latest_version_from_pypi": "1.4.1.post1",
"latest_version_upload_time": "2022-02-12T00:44:06",
"current_version_is_outdated": false
},
"label-studio-os-backend": {
"message": "Merge Develop + LSE hotfix/2.2.7-hotfix.1: Return 404 for api/project/ ...",
"commit": "3239a3d04e65c2cd0091568aa0439d103be2970c",
"date": "2022-05-11 16:05:40 +0300",
"branch": "master",
"version": "3239a3d"
},
"label-studio-frontend": {
"message": "fix: DEV-2100: Fix preselected choices (#584) - working with empty an ...",
"commit": "ee38e771760e1ce57ac62dc1556ddd0718f62487",
"branch": "master",
"date": "2022-04-27T11:55:50Z"
},
"dm2": {
"message": "Fix tasks selection (#44)",
"commit": "97e33ac0a9b0ea09398b00d7916671ca76cf2a71",
"branch": "master",
"date": "2022-04-13T13:55:17Z"
},
"label-studio-converter": {
"version": "0.0.40"
}
}
Additional Context
Lable Studio console output from docker container:
[2022-06-03 09:33:33,002] [core.utils.common::custom_exception_handler::82] [ERROR] f004a96d-aaab-4313-85c8-0d8de6864dea [ErrorDetail(string='load_tasks: No data found in DATA or in FILES', code='invalid')]
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/rest_framework/views.py", line 506, in dispatch
response = handler(request, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/django/utils/decorators.py", line 43, in _wrapper
return bound_method(*args, **kwargs)
File "/label-studio/label_studio/data_import/api.py", line 176, in post
return super(ImportAPI, self).post(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/rest_framework/generics.py", line 190, in post
return self.create(request, *args, **kwargs)
File "/label-studio/label_studio/data_import/api.py", line 208, in create
parsed_data, file_upload_ids, could_be_tasks_lists, found_formats, data_columns = load_tasks(request, project)
File "/label-studio/label_studio/data_import/uploader.py", line 148, in load_tasks
raise ValidationError('load_tasks: No data found in DATA or in FILES')
rest_framework.exceptions.ValidationError: [ErrorDetail(string='load_tasks: No data found in DATA or in FILES', code='invalid')]
[2022-06-03 09:33:33,009] [django.request::log_response::224] [WARNING] Bad Request: /api/projects/3/import
[03/Jun/2022 09:33:33] "POST /api/projects/3/import HTTP/1.1" 400 221
After simulating the request through Insomnia, I narrowed the error down to the Transfer-encoding: chunked
header. If the header is set server will always respond with an error
Thanks for the report!
After simulating the request through Insomnia, I narrowed the error down to the Transfer-encoding: chunked header. If the header is set server will always respond with an error
I'm not familiar with Jetty, can you control headers and just exclude this one from request as a workaround?
Transfer-encoding
is set automatically by WebClient
class from Spring Framework
, and as far as I know there's isn't any way to remove it other than setting Content-Length
header manually, but that would require to manually serialize multipart form before sending the request.
My workaround was to regenerate the OpenAPI Client using RestTemplate
instead of WebClient
.