featurebase icon indicating copy to clipboard operation
featurebase copied to clipboard

Result columns limit offset support

Open young118 opened this issue 5 years ago • 8 comments

Description

Some query result may have very big columns, but I just want a slice of the columns. Of course I can do it by hand, but it is not memory efficient , because the result takes a lot of memory. Can pilosa support limit / offset just like sql, return just the exact slice that i want. In other words, what I want is paging?

Success criteria (What criteria will consider this ticket closeable?)

limit / offset just like sql.

young118 avatar Dec 19 '19 08:12 young118

the query endpoint takes a shards parameter with which you can emulate some kind of paging.

e.g. .../query?shards=0,1,2,3

jaffee avatar Dec 19 '19 13:12 jaffee

@jaffee Sorry for the late reply, I know shards, but it is not what I want. I want a slice of the result columns queried from the whole index. I know this is not a typical situation for pilosa, after all pilosa is not a database, but still thanks.

young118 avatar Jan 16 '20 03:01 young118

if you only pass one shard at a time, you'll get at most 1M column results back at a time, so you can "page" through all the columns one shard at a time

jaffee avatar Jan 16 '20 04:01 jaffee

@jaffee Is there a API that tell me how many shards do I have, and what are their numbers? thanks

young118 avatar Jan 17 '20 02:01 young118

shards are based off of column IDs, so take any column ID and divide by the shard width (2^20) and you get what shard it is in. There is an undocumented /internal/shards/max endpoint which returns the max shard for each index.

On Thu, Jan 16, 2020 at 8:54 PM young118 [email protected] wrote:

@jaffee https://github.com/jaffee Is there a API that tell me how many shards do I have, and what are their numbers? thanks

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pilosa/pilosa/issues/2092?email_source=notifications&email_token=AAHCC42RU5QEKRGLD7DWFLLQ6EMUPA5CNFSM4J47LN5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJGITZY#issuecomment-575441383, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHCC4677ACV4IMDBLAB2UDQ6EMUPANCNFSM4J47LN5A .

jaffee avatar Jan 17 '20 03:01 jaffee

@young118 At the API level, there's a method called AvailableShardsByIndex() which returns a map of index to available shards (as a roaring bitmap).

It doesn't appear that this API is surfaced in an HTTP endpoint; the only thing available there is GET /internal/shards/max, which returns a map of index to MAX shard. So using that, the assumption is that you have data in shards 0 - max shard, and you can then page over that range. Obviously that's not always ideal; we need to provide an http endpoint which supports AvailableShardsByIndex.

travisturner avatar Jan 17 '20 03:01 travisturner

@jaffee thanks, that's a good idea

young118 avatar Jan 20 '20 02:01 young118

@travisturner thanks, helpful

young118 avatar Jan 20 '20 02:01 young118