rendertron icon indicating copy to clipboard operation
rendertron copied to clipboard

Advanced cache configuration

Open AVGP opened this issue 6 years ago • 3 comments

As our caching options became more flexible and caching is an integral part of Rendertron we would like to introduce a different way of handling cache configuration.

Currently, the cache configuration is cache and can either be null, datastore or memory.

The goals of this issue are:

  • Change the cache property of the Config to be a new type (CacheConfig for example) or null.
  • The new CacheConfig type should contain two properties:
    • A provider property that is either datastore or memory
    • A max_entries property that is an optional number
    • A expiry_in_minutes property that is an optional number
  • Adjust and expand the tests to use the new settings (see here)
  • For a first version:
    • The datastore cache should support the expiry_in_minutes setting instead of the hardcoded expiry value (here).
    • The memory cache should support the max_entries setting instead of the hardcoded value here.
  • Expand the docs/configure.md to have a section on the new cache settings that explains how the cache can be configured.

AVGP avatar Oct 13 '19 09:10 AVGP

Please assign it to me.

smtaha512 avatar Oct 15 '19 08:10 smtaha512

whoever tackles this, I think it would be extra awesome if you on top of being able to configure the default expiry time, it was also possible to override that via the API.

For example, on my site I have pages that will never change again vs pages that change on a weekly basis. I would like to cache those that will never change again for a 1 year period and the others might use the default value of 1 week or I may pass that as a parameter.

Of course when Googlebot visits the page, it will use the default parameter, but as part of my flow I pre-cache these by visiting the pages myself, spoofing a crawler bot and this is where I would pass the parameter to specify the per-page expiry time

BTW this may prove to be useful during the testing/building of.. this is my current simplistic implementation for invalidating/recaching pages stored on google's Datastore. It's uses the PHP SDK

    protected $datastore;

    public function handle()
    {
        $keyFile = json_decode(file_get_contents(base_path('/path/to/rendertron-service-account-credentials.json')), true);

        $this->datastore = new \Google\Cloud\Datastore\DatastoreClient([
            'keyFile' => $keyFile,
        ]);

        $url = 'https://www.example.com/some-link';
        $key = $this->datastore->key('Page', "/render/{$url}");

        if ($this->datastore->lookup($key)) {
            $this->output("{$url} - cache found");
            $this->output("{$url} - deleting");
            $this->datastore->delete($key);
        } else {
            $this->output("{$url} - cache not found");
            $this->output("{$url} - caching");

            $this->cache($url);

            if ($this->datastore->lookup($key)) {
                $this->output("{$url} - cache found!");
            } else {
                $this->output("{$url} - cache not found!");
            }
        }
    }

    protected function cache($url)
    {
        $curl = curl_init($url);

        curl_setopt($curl, CURLOPT_USERAGENT, 'Googlebot');
        curl_setopt($curl, CURLOPT_FAILONERROR, true);
        curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
        curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);

        curl_exec($curl);
        curl_close($curl);
    }

vesper8 avatar Oct 15 '19 08:10 vesper8

Can I tackle this problem? or is someone already assigned for it?

kitrakrev avatar Aug 22 '21 12:08 kitrakrev