functions-framework-php icon indicating copy to clipboard operation
functions-framework-php copied to clipboard

Feature Request: Automatically register the stream wrapper for GCS

Open bshaffer opened this issue 4 years ago • 3 comments

Feature request for registering the stream wrapper automatically in the functions framework:

// automatically register stream wrapper for easier access of GCS files

// read 
$myGcsFile = file_get_contents('gs://my-gcs-bucket/my-gcs-file.txt');

// write 
file_put_contents('gs://my-gcs-bucket/my-gcs-file-2.txt', 'New File Contents');

I see a few ways this could be implemented:

  1. If Google\Cloud\Storage\StorageClient exists, register the stream wrapper and add google/cloud-storage as a recommended package.
  2. Require google/cloud-storage as a dependency of this package and always register the stream wrapper.
  3. The runtime could conditionally require google/cloud-storage if it is not in the user's composer.json, and always register the stream wrapper.

I am personally in favor of 1. As a developer, I think requiring google/cloud-storage as part of my app makes sense as a requirement to use the Storage API. Advantages are that it's simpler and removes the potential for a dependency conflict. Disadvantages are the required extra step in requiring the package via composer, and the potential for that package to become out of date since it would not be managed by the runtime.

Another potential issue is if the users wanted to authenticate their streamwrapper with a different project / authenticated account. TODO: Verify there's a way for them to register with a different project or account if desired.

bshaffer avatar Feb 10 '21 20:02 bshaffer

Copying in the context I put in duplicate #68:

Background

The Cloud Storage library for PHP includes a "stream wrapper" implementation, which integrates the Cloud Storage client library with PHP's standard library streaming I/O interface.

This is a common interface developers use to deal with streaming data in PHP, similar to Golang's io.Writer interface.

Opportunity

GCS integration with a Cloud Function needs to use bespoke Cloud Storage integration code to interact with objects.

  1. Existing code developers may want to run as Cloud Functions may not use that code
  2. The stream wrapper interface provides a more portable experience
  3. The stream wrapper interface is commonly understood
  4. It makes the experience of getting started with PHP functions backed by state even easier by removing the need to learn how to use the Cloud Storage library.

Developer Experience

$obj = file_get_contents('gs://my-bucket/my-object');
$obj .= 'Add a new line';
file_put_contents('gs://my-bucket/my-object', $obj);

Suppose developers want a local experience that doesn't require Application Default Credentials or an internet connection to test? In that case, a simple change can be made to the code to make this flexible:

Run the command with the file handler: FILESYSTEM_STREAM_HANDLER=file php ... to use local filesystem:

$handler = getenv('FILESYSTEM_STREAM_HANDLER') ?: 'gs';
$obj = file_get_contents("$handler://my-bucket/my-object");
$obj .= 'Add a new line';
file_put_contents("$handler://my-bucket/my-object", $obj);

This examples shows how to read an object fully into memory. However, using the stream wrapper, developers could also do streaming processing of the data to reduce the memory requirement of their code.

Action Item

If a developer adds the PHP Cloud Storage client library as a dependency, register the GCS stream wrapper so developers have no action to take to start using gs:// URLs. This keeps us from adding a dependency to the function framework, but allows us to reduce boilerplate code so developers can focus on productivity.

To ensure a good experience, we should verify that the error message if ADC is not configured is understandable for functions developers.

Possibly: Add a "suggests" entry in the composer.json for using the storage library.

grayside avatar Feb 10 '21 21:02 grayside

Answering the question in the original post about implementation approach, I'm somewhat in favor of suggests but not a hard dependency. (1) sounds good to me.

As far as registering with a different project/account, couldn't the stream wrapper registration rely on ADC priority rules for that? Or are you thinking we may want to support something dynamic during function runtime?

If the latter I think the fallback to using the client library normally until we have more data on adding override functionality would make sense.

grayside avatar Feb 10 '21 21:02 grayside

@grayside For the use-case where a different project is desired, the users can call stream_wrapper_unregister or call StreamWrapper::register with a different protocol:

use Google\Cloud\Storage\StreamWrapper;
use Google\Cloud\Storage\StorageClient;

$storage1 = new StorageClient(['projectId' => 'a-different-project-1']);
$storage2 = new StorageClient(['projectId' => 'a-different-project-2']);

// unregister the automatically registered one 
stream_wrapper_unregister('gs');
StreamWrapper::register($storage1);

// register a new protocol
StreamWrapper::register($storage2, 'gs2');

bshaffer avatar Mar 03 '21 19:03 bshaffer

This looks like it was done with #75 , closing for now but feel free to re-open if something is left.

josephlewis42 avatar Apr 21 '23 20:04 josephlewis42