data-store Make GCP/(AWS?) configuration optional for a successful test run

trafficstars

User story

As a first-time deployment user, I would like to run the test fixture populate script without GCP (aiming at an AWS-only shop/setup?).

I'm getting this after specifying the variable on environment:

$ tests/fixtures/populate.py --s3-bucket $DSS_S3_BUCKET_TEST_FIXTURES
Traceback (most recent call last):
  File "tests/fixtures/populate.py", line 10, in <module>
    from tests.fixtures.populate_lib import populate
  File "/home/ubuntu/data-store/tests/__init__.py", line 41, in <module>
    "gcp-credentials.json"
  File "/home/ubuntu/hca-venv/lib/python3.7/site-packages/botocore/client.py", line 314, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/ubuntu/hca-venv/lib/python3.7/site-packages/botocore/client.py", line 612, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.ResourceNotFoundException: An error occurred (ResourceNotFoundException) when calling the GetSecretValue operation: Secrets Manager can't find the specified secret.

In other words, is the GCP section of the README absolutely mandatory? If not, which (alternative) steps do you advise as a circumvention?

Definition of done

Not requiring GCP config section when deploying to AWS. I.e: cloud data transfer between commercial services can increase costs unnecessarily.

Nov 05 '18 06:11 brainstorm

From Slack: "You'll need to comment out a few functions and classes and disable dss sync, DSS-event-relay, and a few other things."

Nov 06 '18 00:11 brainstorm

I recently got some fair and reasonable questions about this issue.

Namely, from a design perspective I'm trying to understand the AWS+GCP hard requirement here... each of those clouds have pretty high 9's nowadays. Is the rationale behind having both is "because multi-cloud is better" mantra?

Can somebody point me to the documented design justification for it? I feel like I'm missing the point here. Thanks in advance!

Nov 14 '18 02:11 brainstorm

Related with issue https://github.com/HumanCellAtlas/data-store/issues/570

Nov 14 '18 04:11 brainstorm

The design justification for having a replica of data in both AWS and GCP is not for added reliability but rather for availability. We would like users to be able to access the data from their cloud account regardless of if they have an AWS or a GCP account. We don't want to dictate which type of account they must have. Our long term goal is to expand out to cover other cloud vendors as well.

Dec 20 '18 17:12 kozbo

Then you seem to be optimizing for neither availability nor reliability.

I guess you meant (cloud/codebase) portability instead? Availability in the context of cloud computing is defined as:

Availability in this context is how much time the service provider guarantees that your 
data and services are available. This is typically documented as a percent of time per 
year, e.g. 99.999% (or five nines) uptime means you will be unable to access resources 
for no more than about five minutes per year.

If that's the case and you are optimizing for multi-cloud, here's a good read and eye opener I went through a while ago:

https://bravenewgeek.com/multi-cloud-is-a-trap/

Not trying to be nitty-gritty about definitions and such, just sharing my experiences when trying to go multi-cloud and spending too much time focusing on the "wrong" problem space, hope that helps.

Dec 23 '18 23:12 brainstorm

Not pursuing this feature at this time

Nov 27 '19 17:11 melainalegaspi

Fine @melainalegaspi, then at least specify in the README.md testing section that two commercial clouds (and which ones) are required to run the test suite.

Nov 27 '19 21:11 brainstorm

@brainstorm good point, will do.

Dec 02 '19 16:12 kozbo

data-store data-store copied to clipboard

Make GCP/(AWS?) configuration optional for a successful test run

User story

Definition of done

data-store
data-store copied to clipboard