moodle-tool_objectfs icon indicating copy to clipboard operation
moodle-tool_objectfs copied to clipboard

unlink doesn't seem to delete files in S3 compatible storages

Open abrahammartin opened this issue 5 years ago • 3 comments

We are using GCP S3 compatible interface and files are not being deleted.

On https://github.com/catalyst/moodle-tool_objectfs/blob/master/classes/local/store/s3/client.php#L173 unlink is used to delete the file for S3 storage:

    /**
     * Deletes a file in S3 storage.
     *
     * @path   string full path to S3 file.
     */
    public function delete_file($fullpath) {
        unlink($fullpath);
    }

On https://github.com/catalyst/moodle-tool_objectfs/blob/master/classes/local/store/azure/client.php#L130 the azure client library call is used to delete the file for Azure storage:

    /**
     * Deletes file (blob) in azure blob storage.
     *
     * @param string $fullpath path to azure blob.
     */
    public function delete_file($fullpath) {
        $container = $this->container;
        $relativepath = $this->get_relative_path_from_fullpath($fullpath);

        $this->client->deleteBlob($container, $relativepath);
    }

Shouldn't the function delete_file for S3 storage use the S3 library call to delete the file? This is used when testing deletion permissions in https://github.com/catalyst/moodle-tool_objectfs/blob/master/classes/local/store/s3/client.php#L276

        if ($testdelete) {
            try {
                $result = $this->client->deleteObject(array('Bucket' => $this->bucket, 'Key' => 'permissions_check_file'));
                $permissions->messages[get_string('settings:deletesuccess', 'tool_objectfs')] = 'warning';
                $permissions->success = false;
            } catch (\Aws\S3\Exception\S3Exception $e) {
                $errorcode = $e->getAwsErrorCode();
                // Something else went wrong.
                if ($errorcode !== 'AccessDenied') {
                    $details = $this->get_exception_details($e);
                    $permissions->messages[get_string('settings:deleteerror', 'tool_objectfs') . $details] = 'notifyproblem';
                    $permissions->success = false;
                }
            }
        }

So I think delete_file for S3 storage should be:

    /**
     * Deletes a file in S3 storage.
     *
     * @path   string full path to S3 file.
     */
    public function delete_file($fullpath) {
        $this->client->deleteObject(array('Bucket' => $this->bucket, 'Key' => $fullpath));
    }

abrahammartin avatar Jul 27 '20 09:07 abrahammartin

We have intentionally not fully implemented deletes in S3 Buckets, It was deemed dangerous due to us sharing S3 buckets across environments And to often people want things back after deleting them, and finally object storage being so cheap we we haven't historically saved a lot of money from deleting from S3. Having said all that, there is provision in README.md for deleting S3 files and your changes would need to respect this flag if you want to send through a merge request. CFG->tool_objectfs_delete_externally = 1;

syn avatar Jul 28 '20 04:07 syn

It not about the money, it is about GDPR, and not retaining unnecessary copies of the files after they have been deleted by the user as the law mandates.

The current code (https://github.com/catalyst/moodle-tool_objectfs/blob/master/classes/local/store/s3/client.php#L173) calls unlink (delete) without checking tool_objectfs_delete_externally. If I were to check tool_objectfs_delete_externally then I will be changing the behaviour of the code.

What I'm proposing is to use the S3 native API for deletion rather than the PHP unlink function, the same as you do with the Azure library (https://github.com/catalyst/moodle-tool_objectfs/blob/master/classes/local/store/azure/client.php#L126). The Azure library doesn't call unlink but uses the native Azure API.

abrahammartin avatar Jul 28 '20 07:07 abrahammartin

Ok, the delete_file function is a red-herring as what should be happening is a copy to trash and then delete of the original (using the StreamWrapper's rename).

We have:

$CFG->alternative_file_system_class = '\tool_objectfs\s3_file_system';`
$CFG->forced_plugin_settings['tool_objectfs'] = [
      'filesystem' => '\tool_objectfs\s3_file_system',
      's3_key' => 'redacted',
      's3_secret' => 'redacted',
      's3_bucket' => 'bucketname',
      's3_region' => 'europe-west2',
      's3_base_url' => 'https://storage.googleapis.com',
    ];
$CFG->tool_objectfs_delete_externally = 1;

What works

  • uploaded file gets created in /opt/moodle/data/filedir/XX/YY/XXYY... local file
  • when push_objects_to_storage task runs, file gets copied to bucket in /XX/YY/XXYY... object
  • when delete_local_objects task runs, file get deleted from /opt/moodle/data/filedir/... local folder (folder also cleaned up if only file in it)
  • if signed-urls enabled, browsers successfully gets a 303 to bucket

What should happen (I believe)

  1. use Moodle's file 'popup' to delete file
  2. browser POSTS to /repository/draftfiles_ajax.php?action=delete
  3. /repository/draftfiles_ajax.php uses:
    • $fs = get_file_storage()
    • $stored_file = $fs->get_file(...)
    • $stored_file->delete()
  4. /lib/filestorage/stored_file.php calls:
    • $this->filesystem->remove_file(...)
  5. objectfs /classes/local/store/object_file_system.php calls:
    • $this->delete_object_from_hash(...)
    • which looks at where the file is (should be OBJECT_LOCATION_EXTERNAL) so then call $this->move_external_file_to_trashdir_from_hash(...)
    • which works out the trash path (presumably s3://bucket/trash/XX/YY/XXYY...) and calls $this->rename_external_file(...)
    • which calls $this->externalclient->rename_file(...)
  6. object fs /classes/local/store/s3/client.php uses:
    • PHP built-in rename function but should have already registered the StreamWrapper from moodle-local_aws plugin
  7. aws plugin StreamWrapper rename() function, SHOULD:
    • copy file to trash directory with $this->getClient()->copy(...)
    • then delete with $this->getClient()->deleteObject(...)

Working out exactly where in this chain the actual failure is happening is going tricky (especially as this is running in containers in GCP).

The browser's request to draftfiles_ajax.php does get a success response.

rjg21 avatar Nov 18 '20 13:11 rjg21