firebird icon indicating copy to clipboard operation
firebird copied to clipboard

Implement new "datatype" for blob handling

Open livius2 opened this issue 2 years ago • 33 comments

Hi

currently we have BLOB field which require additional call to retrive it. It is so so slow. Consider implementing different datatype to handle "blob" data. It should be part of record transmission as normal field. Especially when this data column is readonly, like in belolw query result of LIST function.

Below comparision queries for speed test. Test it from remote connection (e.g. 20-30 ms ping). I have tested on FB3 and FB4. First query uses blob from LIST function. Second have CAST list to VARCHAR(32000) and i have added additional ~1KB of data to every record, to show that much more data retrived, but still ~60x faster!

query 1. 15089 ms records 540 query 2. 253 ms records 540

query 1.

WITH RECURSIVE CTE AS (

SELECT RF.RDB$RELATION_NAME, RF.RDB$FIELD_NAME, CAST(1 AS INTEGER) AS LEVEL FROM RDB$RELATION_FIELDS RF WHERE RF.RDB$SYSTEM_FLAG = 1
UNION ALL
SELECT C.RDB$RELATION_NAME, C.RDB$FIELD_NAME, C.LEVEL + 1 FROM CTE C WHERE C.LEVEL<10
)

SELECT 
X.RDB$RELATION_NAME, X.LEVEL, LIST(DISTINCT X.RDB$FIELD_NAME) AS SEP
FROM
CTE X
GROUP BY X.RDB$RELATION_NAME, X.LEVEL

query 2.

WITH RECURSIVE CTE AS (

SELECT RF.RDB$RELATION_NAME, RF.RDB$FIELD_NAME, CAST(1 AS INTEGER) AS LEVEL FROM RDB$RELATION_FIELDS RF WHERE RF.RDB$SYSTEM_FLAG = 1
UNION ALL
SELECT C.RDB$RELATION_NAME, C.RDB$FIELD_NAME, C.LEVEL + 1 FROM CTE C WHERE C.LEVEL<10
)

SELECT 
X.RDB$RELATION_NAME, X.LEVEL, CAST(LIST(DISTINCT X.RDB$FIELD_NAME) AS VARCHAR(32000)) || REPLACE('aaaaaaaaaaaaaaaaaaaaaaaaaaa', 'a', uuid_to_char(gen_uuid())) AS SEP
FROM
CTE X
GROUP BY X.RDB$RELATION_NAME, X.LEVEL

If you like more records to test, simply change C.LEVEL<10 to something else like 'C.LEVEL<50`

livius2 avatar Sep 06 '23 06:09 livius2

This could be solved by other means than a new data type, for example by streaming the blobs inline in a result set, or maybe some compound format that contains the blob ID and the first x bytes of the blob (e.g. +/- 32KB or the actual length if it is less).

For example, by introducing a blr_blob3 to replace blr_blob2, and having the content of the message data be <8-bytes blob id><2-bytes length><data> where given the existing limitations of the message, <data> would be max 32KiB - 10 bytes. Or alternatively, if those limits are dropped, we could use <8-bytes blob id><4-bytes length><data> and maybe make it configurable how much blob data is sent in a message.

If a blob id from a blr_blob3 message is opened, it will be positioned at the first byte after the returned data (or after the end of the blob).

mrotteveel avatar Sep 06 '23 07:09 mrotteveel

If you know for sure that your data fits 32k - you don't need BLOBs. And if your data don't fit 32k - you don't want to transfer them through a thin wire without explicit request.

aafemt avatar Sep 06 '23 08:09 aafemt

If you know for sure that your data fits 32k - you don't need BLOBs. And if your data don't fit 32k - you don't want to transfer them through a thin wire without explicit request.

In a lot of use cases for blobs, the majority of blobs may be relatively small, while there are a few outliers which are larger. So you use blobs to also be able to store the outliers, or because the fact blobs are stored out-of-band has a benefit when querying things that don't involve the blob data. Not to mention that you may need it to avoid the 64KiB row limit.

Returning the first ~32KiB inline will return all data for cases where the blob is actually smallish, and for cases where the blob is larger, you will already have the first 32KiB available for processing, which could very well improve performance even for larger blobs.

mrotteveel avatar Sep 06 '23 08:09 mrotteveel

This could be solved by other means than a new data type, for example by streaming the blobs inline in a result set

I also beleive it is a way to go. To make it more flexible I can think about per-statement or per-attachment setting that control if blob data should be streamed inline or not. Also, it could be useful to set the number of bytes to stream inline.

And no, I don't like blr_blob3, I see no need in it.

hvlad avatar Sep 06 '23 08:09 hvlad

And no, I don't like blr_blob3, I see no need in it.

Well, I needed something the distinguish from the current behaviour of blr_blob2 which will only report the blob id.

mrotteveel avatar Sep 06 '23 08:09 mrotteveel

Why a setting?

select blob_field from table is an explicit request that one needs a blob, and faster is preferred than slow.

asfernandes avatar Sep 06 '23 09:09 asfernandes

To let user decide.

hvlad avatar Sep 06 '23 09:09 hvlad

On 9/6/23 10:29, Mark Rotteveel wrote:

This could be solved by other means than a new data type, for example by streaming the blobs inline in a result set, or maybe some compound format that contains the blob ID and the first x bytes of the blob (e.g. +/- 32KB or the actual length if it is less).

Using new datatype, where first N (not too big value) bytes of BLOB is stored inline, may have one more advantage - such field may be indexed. Something like Paradox used to have such fields.

AlexPeshkoff avatar Sep 06 '23 09:09 AlexPeshkoff

select blob_field from table is an explicit request that one needs a blob, and faster is preferred than slow.

Many apps implemented in a following way: it have a grid with many fields and shows a blob(s) content on demand, when user pressed a button. Almost always the resultset behind a grid contains all fields, including a blob ones. There is no need and no way to show all blobs from all records at the same time. If we will always include blob data into resultset, it will slow such apps significantly. More, it will force users to exclude blob fields from main resultset and create additional request just to fetch desired blob - this is not faster than current separate blob open\fetch\close.

hvlad avatar Sep 06 '23 10:09 hvlad

Using new datatype, where first N (not too big value) bytes of BLOB is stored inline

We speak about storage or about network transfers here ? Blob storage is another theme, I believe.

may have one more advantage - such field may be indexed. Something like Paradox used to have such fields.

Couldn't expression indices serve this goal ?

hvlad avatar Sep 06 '23 10:09 hvlad

Using new datatype, where first N (not too big value) bytes of BLOB is stored inline, may have one more advantage - such field may be indexed. Something like Paradox used to have such fields.

How could applications degrades to have the data not indexed (or fully indexed) where the field length increases?

asfernandes avatar Sep 06 '23 10:09 asfernandes

On 9/6/23 13:10, Vlad Khorsun wrote:

Using new datatype, where first N (not too big value) bytes of
BLOB is stored inline

We speak about storage or about network transfers here ? Blob storage is another theme, I believe.

Yes, certainly. But for sure related.

may have one more advantage - such field may be indexed. Something
like Paradox used to have such fields.

Couldn't expression indices serve this goal ?

Am I wrong that to make use of expression index in a plan that expression should be used in WHERE clause?

AlexPeshkoff avatar Sep 06 '23 11:09 AlexPeshkoff

Blob storage is another theme, I believe.

Yes, certainly. But for sure related.

I don't think so, could you explain ?

Couldn't expression indices serve this goal ?

Am I wrong that to make use of expression index in a plan that expression should be used in WHERE clause?

You correct, but... I still see no relation of index thing with storage details nor with the subject.

hvlad avatar Sep 06 '23 13:09 hvlad

On 9/6/23 16:18, Vlad Khorsun wrote:

    Blob storage is another theme, I believe.

Yes, certainly. But for sure related.

I don't think so, could you explain ?

If we introduce new datatype it's quite logocal to transfer it over the wire in same way as it\s stored on disk. Certainly that's not absolute requirement, it does not make thingsidentical- but for sure relationship.

    Couldn't expression indices serve this goal ?

Am I wrong that to make use of expression index in a plan that
expression should be used in WHERE clause?

You correct, but... I still see no relation of index thing with storage details nor with the subject.

I wanted to say that in order to use expression index one has to use not obvious tricks in SQL statements. With discussed datatype it can be WHERE fieldOfNewBlobType starting with 'ABC'

AlexPeshkoff avatar Sep 06 '23 13:09 AlexPeshkoff

Why invent a new datatype when IStatement::openCursor() has enough room in "flags" parameter for CURSOR_PREFETCH_BLOBS flag?

aafemt avatar Sep 06 '23 14:09 aafemt

Why invent a new datatype when IStatement::openCursor() has enough room in "flags" parameter for CURSOR_PREFETCH_BLOBS flag?

Client-side record is a message which format is described using BLR. If we extend the message with blob data chunk, it should be somehow described. AFAIU, this is what Mark suggests. If you're going to prefetch blobs in some internal buffers of the statement object and feed getSegment() from these buffers, of course a new data type is not required.

dyemanov avatar Sep 06 '23 14:09 dyemanov

Yes, that's what I suggested to do: transfer BLOBs as usual by id but if the flag is set - send content of the blob immediately (automatically requesting it from client on fetch() may be) and store it in a temporary file. Then client application request BLOB content as usual but it turn out to be client-only call which is greatly faster.

aafemt avatar Sep 06 '23 14:09 aafemt

Blob data can be sent by server automatically with usual sequence of packets before op_fetch response and client can handle it while waiting for the response. This will make blob delivery zero round-trip.

aafemt avatar Sep 06 '23 14:09 aafemt

Hello,

Let me describe how it has already done in another "deprecated" API - OLEDB :)

  1. Result value has status. Status can be: S_OK, S_ISNULL, S_TRUNCATED and so on. See "enum DBSTATUS" in OLEDB documentation.

  2. You can bind (include) a column in a fetched row twice.

For example, you can bind BLOB column

  1. As direct data (bytes or text)
  2. As storage object (stream)

In your case, you can define two binds for BLOB

  1. (bind1) As direct data with a limit on size. For example 4KB.
  2. (bind2) As BLOB ID.

If the data of bind1 has S_OK status - you will use this data.

If the data of bind1 has S_ISNULL, then the data of bind2 will have S_ISNULL too.

If the data of bind1 has S_TRUNCATED, you can use data (BLOB ID) from bind2 and read BLOB through separated calls.


OLEDB also allows to read the row data twice. Because the fetch operation does not return the row data but the row handle.

You can try to read the column with BLOB directly in the user memory buffer with fixed length. If you get a value with status S_TRUNCATED, you can read this column as storage (BLOB ID) again.


Of course, you can always read all the data of BLOB directly. Just use DBTYPE_BYREF modificator for bind datatype. OLEDB provider will allocate memory for BLOB data and return the pointer of this buffer. User must free this memory.


This was invented 25 years ago.

dmitry-lipetsk avatar Sep 10 '23 07:09 dmitry-lipetsk

I have thought about introducing a new blob data type eg. GBLOB - Good BLOB ;-). It would be beneficial if it did not have a handle at all, contrary to the current BLOB, which does. Additionally, if stored somewhere, it could be simply forgotten after a SELECT. Currently, a BLOB is not freed after operations such as SELECT, since we can ask for the BLOB content by ID. A new data type would be straightforward to use, and even BLOB_APPEND would not be needed, as there is no delayed fetch at all.

livius2 avatar Feb 13 '24 17:02 livius2

No new type is needed. It is necessary to optimize the protocol for the current blob implementation. For example, be able to prefetch a small part of a blob. If the entire blob fits into this prefetch, then open, get_segment and close will not create additional network packets at all.

sim1984 avatar Feb 13 '24 18:02 sim1984

@sim1984 In addressing the current issue with blobs, yes, It can conceal some problems, but not the overarching issues related to the lifespan of a blob. GBLOB should be forgotten after it is retrieved. The current BLOB persists 'forever' until you commit or rollback. This is because the engine never knows when someone might request the blob by its BLOB ID.

livius2 avatar Feb 13 '24 20:02 livius2

How GBLOB would be different from e.g. an unlimited-length VARCHAR? Well, nothing is really unlimited in real life, but let's say 4GB-limited. It's also a different data type (API gonna be affected), it also requires some work to implement, but usage-visible behaviour should be exactly like you describe.

dyemanov avatar Feb 14 '24 07:02 dyemanov

AFAIU GBLOB is suggested as PSQL-only and never appear on client side.

aafemt avatar Feb 14 '24 10:02 aafemt

@aafemt AFAIU GBLOB is suggested as PSQL-only and never appear on client side.

It appear in select too so also on client side.

@dyemanov unlimited-length VARCHAR

Interesting thing, but i think that we must then cast all to VARCHAR eg. result of LIST. I think about making GBLOB as default for all, but if someone need to to have blob ID it will use old BLOB explicitly in declarations.

livius2 avatar Apr 22 '24 06:04 livius2

It is not necessary to invent new types, it is necessary to optimize work with existing ones. New data types will practically not be used, because it requires rewriting access components or drivers (odbc, etc.). For example, look at decfloat. Do many components support it?

sim1984 avatar Apr 22 '24 19:04 sim1984

@sim1984 Addoption of new data types in drivers is slow but increase. But current design of blob rather prohibit speed in this matter, providing handle to object which cannot be forgotten and can be used/fetched in any times have big consequences as you see with BLOB_APPEND existence...

livius2 avatar Apr 23 '24 05:04 livius2

There is no need to mix up the method of transferring BLOBs (network protocol) and retaining record versions (database bloat).

  1. Problems with transmission speed occur only for short BLOBs that are not transmitted along with the fetch of each record. In any case, long BLOBs will have to be transmitted in parts (no one will cram 100 megabytes into one network packet).

  2. Version retention does not depend in any way on whether the BLOB was transferred entirely or by identifier. This only makes sense with legacy transaction isolation levels (read_committed [no] record version read). In READ COMMITTED READ CONSISTENCY, versions (including temporary BLOBs) will be retained until the cursor is closed (or the last record fetch), in SNAPSHOT, as expected, until the transaction is completed.

  3. Problems with LIST can be solved by adding an additional parameter to it, which would indicate where BLOBs are created in temporary space or in the main database. In addition, you can add a parameter there indicating whether the BLOB will be streamed or segmented.

  4. Inside PSQL, for local variables and output parameters, it would be possible to somehow indicate that this BLOB is needed only in temporary space, so operations with it would not lead to bloat of the main database.

sim1984 avatar Apr 23 '24 06:04 sim1984

Any impreovement in this matter will be huge benefit. We will use whatever will be probided. But look at point 2, blob must be remembered, and cannot be forgotten until transaction end, isn't it very bad? You have not such problems with any other data types, that server must maitain its lifespan.

livius2 avatar Apr 23 '24 07:04 livius2

But look at point 2, blob must be remembered, and cannot be forgotten until transaction end, isn't it very bad?

First, in this case they used to occupy temporary space, not database space. Second, they can be forgotten as soon as they are transferred to client and cached there. In this case they will occupy client temporary space.

aafemt avatar Apr 23 '24 09:04 aafemt