[Deployment Revisited][Staging] Add import logic for Jobs in the job-exporter cronjob
Motivation
In order to kickstart fuzzing in a testing environment, we need to mirror the Job, Fuzzer and DataBundle entities. This PR adds the capability of importing Jobs from the exported data, implemented in https://github.com/google/clusterfuzz/pull/4808 .
Implementation
Assuming that all data was exported to $SOME_BUCKET, the folder structure for jobs looks like this:
$SOME_BUCKET/
job/
entities
some-job/
entity.proto
custom_binary_key
another-job/
entity.proto
custom_binary_key
The proto file is the serialized representation of a data_types.Job entity, and the other files is the associated custom binary, if present.
The entities file contains line separated jobs names that were last exported.
To import said entities, the cronjob will:
- Parse entity names to be imported from the entities file
- Upload the custom binary to the target project's blobs bucket, and update the entity with a new custom_binary_key from the new blob id
- Deserialize the protobuf and persist the entity into datastore, by performing an environment string substitution
As per b/422759773, some jobs override the following env vars:
- CORPUS_BUCKET
- QUARANTINE_BUCKET
- FUZZ_LOGS_BUCKET
- BACKUP_BUCKET
This implies testing environments performing mutations on a production environment, thus breaking prod isolation. To circumvent this, environment string substitutions will be performed.
The environment string substitution will be defined as a YAML map in project.yaml, over in the clusterfuzz-config repository, of the following format:
job_exporter:
env_string_substitutions:
source_value_1: target_value_1
source_value_2: target_value_2
...
This allows us to create the corresponding buckets, and redirect read/writes to within the test environment.
Unit tests for:
- Correctly creating a Job, from the state where it still does not exist, and performing the desired env string substitution
- Correctly deleting a Job, when from the state where it is present in the target environment, in response to the export list not containing its name
- Correctly updating a Job, once a newer revision with different blobs or fields is exported from the source project, while also ensuring correct env string substitution