pyDataverse icon indicating copy to clipboard operation
pyDataverse copied to clipboard

upload_datafile: handling of the content (mime) type

Open landreev opened this issue 3 years ago • 4 comments

Any change needs to be discussed before proceeding. Failure to do so may result in the rejection of the pull request.

All Submissions

Describe your environment

  • [x] OS: MacOS X 11.4
  • [x] pyDataverse: 0.3.1
  • [x] Python: 3.8.8
  • [x] Dataverse: 5.9

Follow best practices

  • [x] Have you checked to ensure there aren't other open Pull Requests for the same update/change?
  • [x] Have you followed the guidelines in our Contribution Guide?
  • [x] Have you read the Code of Conduct?
  • [x] Do your changes in a seperate branch. Branches MUST have descriptive names.
  • [x] Have you merged the latest changes from upstream to your branch?

Describe the PR

There is currently no way to pass the content (mime) type to upload_datafile() (see #118). Also, when the multi-part POST form is created inside the method, NO content type is specified for the upload. This apparently fools Dataverse into defaulting to "text/plain", without attempting to use its normal type detection methods. In other words, in its current form, all files uploaded via pyDataverse end up with the content type "text/plain". Even when they are of types normally recognized by Dataverse (popular image types, etc). This defaulting behavior can and should be addressed on the Dataverse side. But it should be a good idea to fix it on the pyDataverse side as well. So this PR does 2 things:

  1. Provides a way to supply the mime type explicitly; and
  2. Makes it default to the standard application/octet-stream - a polite way to say "type unknown" - when creating a multi-part POST entry, like curl does; which then prompts Dataverse to at least attempt to identify the file more accurately. This is achieved by switching to the long notation of passing the file to the requests.post method: from {"file": open(filename, "rb")} to {"file": (filename, open(filename, "rb"), content_type)}.

On the Dataverse side this is tracked in https://github.com/IQSS/dataverse/issues/8344

  • [x] What kind of change does this PR introduce?
    • bug fix/improvement
  • [x] Why is this change required? What problem does it solve?
    • see the description above and the discussion in the linked issues
  • [ ] Screenshots (if appropriate)
  • [x] Put Closes #ISSUE_NUMBER to the end of this pull request

Testing

  • [ ] Have you used tox and/or pytest for testing the changes?
  • [ ] Did the local testing ran successfully?
  • [ ] Did the Continous Integration testing (Travis-CI) ran successfully?

Commits

  • [ ] Have descriptive commit messages with a short title (first line).
  • [ ] Use the commit message template
  • [ ] Put Closes #ISSUE_NUMBER in your commit messages to auto-close the issue that it fixes (if such).

Others

  • [ ] Is there anything you need from someone else?

Documentation contribution

  • [ ] Have you followed NumPy Docstring standard?

Code contribution

  • [ ] Have you used pre-commit?
  • [ ] Have you formatted your code with black prior to submission (e. g. via pre-commit)?
  • [ ] Have you written new tests for your changes?
  • [ ] Have you ran mypy on your changes successfully?
  • [ ] Have you documented your update (Docstrings and/or Docs)?
  • [ ] Do your changes require additional changes to the documentation?
  • Closes #118

landreev avatar Jan 21 '22 19:01 landreev