DataConnectors icon indicating copy to clipboard operation
DataConnectors copied to clipboard

Mashup container state and performance on cloud platforms

Open monty241 opened this issue 7 years ago • 0 comments

The mashup container tear down effectively disables in-process caches, which occurs quite often. It would be great when part of the state could be maintained more easily than each developer creating a solution for that.

Background

We've been providing a provider using advanced query for some time on 50+ platforms and it works fine. However, some platforms are slow in data transfer. Whereas a platform such as Oracle or SQL Server aggressively cache data in memory and provides high throughput after initial bringing the data from disk into memory, other platforms (especially the cloud platforms we run on) often only ship 100 rows per second, even with GB bandwidth connections.

This is a generic element found across cloud platforms such as Exact Online, Freshdesk, NMBRS, Loket, Visma Severa, etc. Only few provide throughput of 1.000 rows or more per second and these are often just 1-on-1 mapping rows to output. So it does not classify as a performance bug in that platform, but just a state-of-technology.

Problem

This performance holds on both initial and all subsequent loads. And the mashup container tear down effectively kills our own SQL engine memory-backed caches.

Workaround

For high volume users we have a database-based cache, which caches the data downloaded for a number of days in a self-maintained database like sql server or postgresql. When a SQL statement is decorated with a hint to use that cache (like /*+ ods(true, interval '300 minutes') */), the database-based cache will be used when sufficiently fresh and otherwise data will be fetched from the source and cached again. Using the database-backed cache improves performance from 100 rows/sec to typically 50-2500 better, so 5K..250K rows per second.

Issues

That workaround works fine, but there are two issues:

  1. Other parties developing drivers for Power BI on slow platforms will need to create a custom solution themselves. Reinventing the wheel every time is expensive and avoidable, plus is intransparant: the user can not predict in advance what the performance levels will be of competing drivers.

  2. Large enterprises typically can easily provide a professional database, but SME businesses will have to configure and run one, using on-premise server or on Azure or AWS. Or use a file-based store. This adds to the technical footprint of Power BI and makes deployment harder, effectively reducing the possible market volume of Power BI.

Suggested Solution

Provide a hook into some shared memory structure from within in the mashup container. The shared memory structure is only accessible through some APIs for use as a cache and that cache state is maintained across sessions. Power BI decides whether to kill the shared memory structure or not (for instance when a provider misbehaves), but in general tries to keep it intact.

monty241 avatar Jun 25 '17 09:06 monty241