turbinia Create new GCS -> Persistent Disk copy task

This will create a new Persistent disk with a filesystem that is slightly larger than the image file on GCS, and will copy the raw image from GCS directly as a file in the Persistent Disk FS. This will require a new Evidence type called something like PersistentDiskLocalImage.

Aug 23 '17 18:08 aarontp

Adding @sa3eed3ed He's working on cloudforensiclib that can help us with this issue.

Jun 02 '20 15:06 alimez

following up on our former discussion, https://github.com/google/cloud-forensics-utils/pull/169 is now merged. Raw disk images (+ other formats) can be imported from GCS to GCE. The created disk might be bigger due to some size restrictions in GCE, however the hash of the original disk matches the hash of the created GCE disk starting from the first byte and up to the byte count of the original disk. The byte count and the hash of the original disk are returned. Something similar to this can be done for verifying the evidence integrity, possible from the analysis VM: result['md5Hash'] = hash(created_gce_disk, start_byte=0, end_byte=result['bytes_count'])

More information: https://github.com/google/cloud-forensics-utils/blob/50396979a6e3e330fedb186a4b5942ce9dc0cff3/libcloudforensics/providers/gcp/forensics.py#L145

EX code:

import libcloudforensics.providers.gcp.forensics as forensics result = forensics.CreateDiskFromGCSImage( 'my-test-project-id', 'gs://evidense_images/folder/raw.dd', 'europe-west2-a', 'new-gce-disk')

result = {'project_id': 'my-test-project-id, 'disk_name': 'new-gce-disk', 'zone': 'europe-west2-a', 'bytes_count': '4294967296', 'md5Hash': 'f14c653659dcc646c720072fe0b682a9'}

Jun 30 '20 16:06 sa3eed3ed

@sa3eed3ed That's awesome, thanks! I wonder if we should record the original size as metadata in the new disk somewhere. Should we create a method within libcloudforensics for verifying the hash given the size information? Mostly I want to make sure we have a documented way of verifying the hash somewhere. Thanks!

Jun 30 '20 18:06 aarontp

@aarontp regarding hash verification, there are several ways to do it, but first a few points because I think we might need to do it in a different way:

Currently importing disks is done via Cloud Build API, by running Daisy import workflow to create an image out of a source file and then the compute.disks APIs to create disk out of a GCE image.
I looked more into that, the workflow uses qemu-img to convert the source image to raw image and then create a GCE image out of that, this means for a forensically solid disk copy, the source image must only be a raw disk image.
I think the most suitable solution here is to: create a GCE disk bigger in size -> dd the source image to a file system path inside the GCE disk -> Verify the hash of the embedded image against the source image -> return the GCE disk with embedded raw image, and label the disk with : the path to where the image is located, the hash and the hash verification result. This means Turbinia can process the evidence as googleclouddiskembedded, let me know if this make sense to you, and if I should start working on this.

Jul 01 '20 15:07 sa3eed3ed

@sa3eed3ed Yeah, the process you outlined in the third bullet point is actually the way I was thinking we would need to do this originally, but if we have a forensically verifiable way to do it through another mechanism directly to a persistent disk, like you have, I'm fine with that too. I don't think we necessarily need to do the hash verification every time we process a disk, I think the most important thing is that we have a documented way to do so when needed. That being said, this is the first time we're directly changing the original evidence type before it starts to get processed, so it might be a nice to have.

I'm imagining that the best way to do this on the Turbinia side will be to have a Task that can process something like a GCSRawImage evidence type and process it via one of the two ways you mention above (either runs the libcloudforensics code you link above, or does the dd method). I think in either case we could just get the original size and hash[1] then, record that info into the second evidence object that gets created (GoogleCloudDisk or GoogleCloudDiskRawEmbedded).

If this is just a call into libcloudforensics that fully uses the API, another option could be to do this in turbiniactl before the request is even made (similar to how we copy disks from another project), but in general I'd like to have the actual processing code in the Tasks unless we need to act as the permissions of the end user (like we do for disk copies).

In summary, I think we could probably use either method here, but let me know if I'm missing some reason why you're suggesting the dd method instead.

[1] It looks like we can hash things on GCS easily enough with gsutil, though I'm not sure if this is done via an API, or if it just pulls back the entire fire and hashes it client side. https://cloud.google.com/storage/docs/gsutil/commands/hash

Jul 01 '20 19:07 aarontp

@aarontp

For raw disk images the current method will work, I thought about other disk image format (ex: VMDK, VHD, etc..) the current workflow will first convert these to a raw disk image, it will not be identical to the source image (ex: VMDK) in GCS and the hash will not match, if you think this is out of the scope of this issue and the current scope only focuses on raw disk images that is completely fine. If you think it is a good idea we can take the other formats to a new issue and do a workflow similar to: GCS-> dd -> embeddedEvidenceDisk.[VMDK|VHD|qcow2|..], I can add this import workflow to libcloudforensics and then Turbinia will have to process these formats.
The GCS object hash can be obtained via a single non-blocking API call, it is stored with the object metadata. Computing the hash of the imported GCE disk is what needs to be done, this will require attaching the disk to a VM first, I am thinking of having the disk attached to a Turbinia worker and running this as a task that prints an attestation statement at the end, since it might take long (depending on the disk size) and might add a lot of delay if we do it before the processing start. Another possibility is to have a libcloudforensics function that starts a VM and attach a disk in read-only mode, compute the hash and store it to the disk metadata, you can then call this from within the Task code -> create a VM -> attach disk -> compute the hash -> delete VM. I can work on this on the libcloudforensics side.
If the disk is in GCS bucket where the evidence is stored is in the same project as where Turbinia is running, then the import code can be part of a Task, since the code will trigger a Cloud Build job via the VM SA (service account) and then Cloud Build SA in the project will have access to the GCS buckets of the same project by default.

Jul 02 '20 00:07 sa3eed3ed

turbinia turbinia copied to clipboard

Create new GCS -> Persistent Disk copy task

turbinia
turbinia copied to clipboard