GlobaLeaks icon indicating copy to clipboard operation
GlobaLeaks copied to clipboard

File upload description fields should be properly re-implemented

Open evilaliv3 opened this issue 9 years ago • 10 comments

Since the token addition implemented in relation to D8 a file uploads does not get any db-id until the submission is finalized.

this means that no db-id is usable as key reference between a submission field of kind file upload and the file itself so that the following action should be taken:

  • short term fix: current code related to file upload fields (e.g. the descriptions by the user) should be removed/disabled.
  • long term fix: a unique identifier should be associated to the file and exported by the glbackend API to allow file-description_fields association; basically this could be implemented with a random sequence generated by the backend for each uploaded file and to be stored in the file table when the file metadata is saved from the token memory data stracture to the the db.

\cc @vecna @fpietrosanti

evilaliv3 avatar Apr 21 '15 12:04 evilaliv3

@evilaliv3 What if we assign a db-id to each file that get uploaded, as long as it start touching the disk? That could be also a way to detect in a cleaner way file upload that are orphaned directly from DB

fpietrosanti avatar Apr 21 '15 12:04 fpietrosanti

@fpietrosanti this was, and this has been removed by the research of @vecna that was specificly oriented in reducing db accesses on this front.

evilaliv3 avatar Apr 21 '15 12:04 evilaliv3

@evilaliv3 Is this an issue, considering a flood protection threshold on the amount of new-file-upload that can happen over time?

fpietrosanti avatar Apr 21 '15 12:04 fpietrosanti

i agree on your point that the token alone would limit the dos attack surface but:

  1. since always a file is associated to a tip in order to have automatic cleanup in case a tip is deleted
  2. now the tip is created only at tip finalization so that files can be saved and their id generarated only at the end
  3. file are saved and the user would have to type file related infos (the description) before 2

@hellais what is your thinking?

evilaliv3 avatar Apr 21 '15 15:04 evilaliv3

What about having a new token that is a file specific token that should be spent only for file uploads.

When you end creating the submission you will pass the file upload token(s) previously generated. We can have a scheduler or other time based system that will remove stale uploads that have not been touched in the last X hours and don't have an associated submission ID.

hellais avatar Apr 21 '15 15:04 hellais

wait hellais you are missing the point. no issue on cleanup exists but simply now that the token works completely in memory, on client side we have no id to associate fileupload fields (e.g. the description of files of amnesty) to the file (cause when fileupload fields are filled by the user no file id has been still associated)

line 190 of https://github.com/globaleaks/GlobaLeaks/blob/devel/client/app/scripts/controllers/submission.js#L190 will clarify what is the operation that is no more possible due to the missing id on files before completion of the submission. so we should rely on a in memory "key" (a random) to be generated upon file upload and then stored inside the InternalFile so that on Tip Status page the ui can re-associate fileupload-fields with the related file.

evilaliv3 avatar Apr 21 '15 16:04 evilaliv3

Yes I understand this, that is why I was suggesting a token based scheme that works like so:

POST /token/fileupload { file_id: XXXX }

You use flow.js to upload to

/file/$FILE_ID

Then when you do the submission you send send:

POST /submission { file_ids: ["XXX"] }

Does this better explain what I had in mind?

hellais avatar Apr 21 '15 16:04 hellais

@hellais i perfectly understand your concept but it's not answering the issue exposed. i'm also adding somehow the same mechanism. you have to clarify if you think this token should be something implemented only in memory and so based on a random (not unique) or on a db id (and so unique but with a regression with the implementation by vecna that tires to avoid db writes for each file upload)

evilaliv3 avatar Apr 21 '15 16:04 evilaliv3

Oh I see what you mean. Does it really matter? Why not put in the DB? How many writes are we even talking about?

I don't expect there to be a large enough number of writes (that is creation of files) to justify not keeping it in the database. Perhaps I am mistaken.

(For me a large number of write is more than 1k per second)

hellais avatar Apr 21 '15 16:04 hellais

@hellais the whole point of the token refactor is avoid touching DB in write mode as much as possible. I guess the best way do to so, is sent the description along with the submission, and keep a trackers about "file-temporary-id-1" : "description 1", "file-temporary-id-2": "description 2", in submission backend, InternalFile get update accordingly.

vecna avatar Apr 22 '15 13:04 vecna