physionet-build icon indicating copy to clipboard operation
physionet-build copied to clipboard

Prevent users from uploading password-locked pdf files when submitting their CITI training report

Open mscanlan-git opened this issue 1 year ago • 5 comments

This hasn't really been a problem in the past, but lately I've noticed an uptick in the amount of training reports being submitted with this problem. If a user uploads a training report that is password locked, it looks like the page will throw a 500 error because it cannot load the pdf. At the moment, it is not possible to reject these applications as accessing the page isn't possible. I think either of these two solutions would be best:

  1. Prevent users from uploading any pdf file that has a password requirement on it
  2. Allow admins to view the training application, but have the pdf renderer hidden (this way, an admin can reject & comment that the pdf file is password locked and should be re-uploaded without the password requirement).

Option 1 is likely the easiest option.

mscanlan-git avatar Jan 17 '24 21:01 mscanlan-git

It seems very strange that somebody would do that intentionally. I assume that this was not a genuine PDF file from CITI, but maybe somebody downloaded the CITI PDF and edited it before uploading to PhysioNet?

In the past I had proposed that we require people to upload only the genuine PDF file from CITI (which contains a verification code that we can parse), but I think some folks thought this would make things too complicated for applicants. However, that was all prior to (a) the identity/training split, and (b) the CITI API support.

So one idea: have a single page that either lets people upload a PDF (and requires it to be a genuine CITI PDF), or lets people copy and paste the verification URL (so we can then get the completion report from CITI directly.)

Second idea is to add a form saying "if you have taken the CITI course as an MIT affiliate, select your (verified) institutional email here" (and then we can get the completion report through the CITI API.)

bemoody avatar Jan 18 '24 17:01 bemoody

Okay, this looks like a possibly serious problem.

The three reports that triggered errors yesterday (that I noticed) all look like they belong to real people (two of them have already passed identity check, the third is pending.) These three PDFs are indeed encrypted so I can't read them.

I just checked the 10 most recent submissions in /data/pn-media/training/. All of them are using PDF "password protection" (i.e., not really encrypted but have the please-don't-edit-me flag), which also prevents our old method of extracting the verification code. Some of these are completion reports while others are certificates, but all appear at first glance to be genuine.

bemoody avatar Jan 18 '24 21:01 bemoody

Before I delve too deeply into this issue, how do people currently feel about the following options:

  1. Require every applicant to upload the original PDF file from CITI, i.e. if the file has been modified in any way their submission is rejected.

  2. Require every applicant to register as an "MIT Affiliate", i.e. if they have already taken a training course then they might be required to re-take it. (Unless things have changed and it is possible to transfer credit between institutions.)

  3. Require every applicant to do one or the other of the above.

@mscanlan-git @kepaik @tompollard

bemoody avatar Jan 19 '24 19:01 bemoody

@bemoody thanks for looking at this!

Require every applicant to upload the original PDF file from CITI, i.e. if the file has been modified in any way their submission is rejected.

Sounds reasonable to me (and perhaps even a feature). I can't think of many legitimate reasons for modifying a CITI PDF, other than perhaps reducing file size.

Require every applicant to register as an "MIT Affiliate", i.e. if they have already taken a training course then they might be required to re-take it. (Unless things have changed and it is possible to transfer credit between institutions.)

Bit more burdensome for the community, but perhaps not a bad thing. The consistency might make the review process more straightforward for us.

Require every applicant to do one or the other of the above.

Sounds fair to me.

tompollard avatar Jan 19 '24 19:01 tompollard

  1. Require every applicant to upload the original PDF file from CITI, i.e. if the file has been modified in any way their submission is rejected.

Agree with Tom, there shouldn't be any real reason a user needs to modify a file, uploading it directly from CITI makes sense.

  1. Require every applicant to register as an "MIT Affiliate", i.e. if they have already taken a training course then they might be required to re-take it. (Unless things have changed and it is possible to transfer credit between institutions.)

I try to do this already (especially when it isn't clear based on the course modules whether or not the user's institution is providing a relevant course or not) so I think it's reasonable. It would be ideal that everyone is expected to take the same course to standardize the requirements across the board.

  1. Require every applicant to do one or the other of the above.

This sounds good to me

mscanlan-git avatar Jan 19 '24 20:01 mscanlan-git