tiedot Range query support

Range query support

Open HouzuoGuo opened this issue 11 years ago • 29 comments

Currently tiedot only supports hash table index, it will be very helpful to add another index type to support range queries.

Jun 28 '13 10:06 HouzuoGuo

+1 I was going to ask this question. I'm a key-value db noobie, I thought there is way to do range query, turns out it is not supported yet :)

Aug 08 '13 09:08 ifq

recently I am practicing Scala, I will start working on more tiedot stuff when I have bit more free time C:

Aug 08 '13 09:08 HouzuoGuo

btw tiedot isn't quite a key-value db, it is more like a conventional nosql db.

your hovering ability is cool.

Aug 08 '13 09:08 HouzuoGuo

well, you need enough lights and faster shutter:)

I'm not good at db stuff, and I'm looking for a embedded db for my small project. so whats your recommendation? tiedot or leveldb? (probably not the right man to ask:)

Aug 08 '13 09:08 ifq

It probably depends on what your use case is and how big fan of Go you are. LevelDB has proven performance and reliability, while tiedot is a spare-time pet project (although it was made with utmost seriousness).

Aug 08 '13 10:08 HouzuoGuo

I need to store about 50000 image file's info, like filename, size, location, tag, time,etc. and query with tag or time etc. it's a practice project for me.

Aug 08 '13 10:08 ifq

which language is it in?

Aug 08 '13 10:08 HouzuoGuo

in Go. I chose Go because it can setup a http server inside the program, no dependency issue, and seems fast. I want everything compact in my app, so it easy to deploy. I just realize LevelDB is not Go program, what I wanted to say was choice between leveldb-go or tiedot.

Aug 08 '13 10:08 ifq

I was aware of leveldb's implementation in Go. Depends on your preference, leveldb is a key-value store, your data may be stored in these maps:

(filename => image), (image => size), (tag => image), (image => time).

If you chose to use tiedot, you may store entire image metadata in one document, similar to:

{"image": "~/png", "size": 1024, "tags": ["friend", "family"], "location: {"country": "CN"}}

And then put indexs on image, tags.

Two different paradigms, I think both of them should work for you.

Aug 08 '13 10:08 HouzuoGuo

tiedot's way looks good. If all the images separate in different folders, and I want to list all the folder and image amount inside each folder, how should I implement it? should I create another collection or just insert another document in the same collection that contains image info, document like:

 {"folder":"path/to/dir", "amount":99}

Aug 08 '13 11:08 ifq

The easiest way is...

find /path/to/dir -name '*.jpg' | wc -l

But if you prefer to think in NoSQL, see if this works: each document in collection library represents an image; the document itself has file path information (let's make it absolute).

Now we want to count number of images (documents) in a path. Problem is that path is hierarchical so we have to figure out a way to index all information in an absolute path, therefore let us index all paths which lead to the image, and put them into a vector, for example, given image /home/howard/pix/1.jpg, the document will look like:

{"dirs": ["/", "/home", "/home/howard", "/home/howard/pix"], "abspath": "/home/howard/pix/1.jpg"}

Put an index on dirs, and the image will appear in search result of dir eq /home, dir eq /home/howard, etc.

Aug 08 '13 11:08 HouzuoGuo

what I need is list all the folders that contain image, with image amount. not one given folder's image amount.

my app is about http server and image display, so I want to show all the folders and subfolders in one page. I can walk through folders with program, but I want to save the result for later usage.

Aug 08 '13 11:08 ifq

How many concurrent users do you want to support?

If not many and your metadata collection isn't too big, then collection scan (the method above) may not be a bad idea.

But if you have hundreds of concurrent users and metadata collection is not sharded, then latency could lead to bad UX.

Aug 08 '13 23:08 HouzuoGuo

less than 10 users. I'll try it. thank you very much~

Aug 09 '13 02:08 ifq

Hi, is there any chance you will add range query feature recently?

It seems I need this feature very much:) Otherwise, I don't know how to select data by time range. Iter them all and check time field manually?

Aug 13 '13 11:08 ifq

Hello!

Recently I shifted my attention to Scala, check out my project "Schale". It has made its way to first release and now I can do some more Golang...

Range query support will definitely be the next major feature, together with new query syntax (the current query syntax is very ugly).

How granular are your time range queries (by month/day/hour)?

Aug 13 '13 23:08 HouzuoGuo

that's good news.

I need query by day , or may be by integer range.

Aug 14 '13 03:08 ifq

It may take a little while to add range index support, but talking about range "query", given that your queries work with discrete integer values over a small range, how about I make a feature to do hash lookup over a range of values?

For example... to find photos taken in between February and May, it is merely a hash table lookup of month = 2,3,4 and 5.

Aug 14 '13 03:08 HouzuoGuo

That might helps too . In that case, my document format would be:

 {"year": "2013", "month": 3, "day":"2", "tags": ["friend", "family"],...}

right?

this could be a temporary solution.

Aug 14 '13 03:08 ifq

Sounds good. Let's go ahead and support this simple range query first.

Aug 14 '13 05:08 HouzuoGuo

Hey buddy.

The new query processor adepts the new range lookup feature together with totally re-designed syntax.

Please check out latest master branch and give API v2 a try by running tiedot with -mode=v2.

I have not yet completed new API document, but here's a glimpse:

Lookup {"eq": "the_value", "in": ["path_segment1", "segment2"]}
Value exists {"has": ["path_segment1, "segment2"]}
Get all docs "all"
Union [query1, query2, etc]
Intersect {"n": [query1, query2, etc]}
Complement {"c": [query1, query2, etc]}
Range lookup {"int-from": 1, "int-to": 12, "in": ["path_segment1", "path_segment2"]}

New syntax should be a lot more cleaner, and benchmark shows that new query processor is consistently 5% faster compare to the old one.

How does this look?

Aug 16 '13 11:08 HouzuoGuo

yes, It's better than v1 syntax. I'll try it out.

does this range lookup implemented as you said before, or it's the real range query already?

Aug 18 '13 12:08 ifq

Yes. "range lookup" uses hash table and only supports integers.

Aug 18 '13 22:08 HouzuoGuo

Remember to add query result ordering options as well.

Jan 30 '14 19:01 HouzuoGuo

@HouzuoGuo How would I go about implementing reverse result ordering? I.e. get me the last 30 items inserted.

Also, I have a better ID generation method for you :) I'll submit a PR in the near future.

Feb 01 '14 16:02 kenkeiter

@kenkeiter Thank you very much, I look forward to it.

Result ordering has very limited support at the moment, and getting latest 30 docs cannot be easily done. We will introduce proper range index in the future, stay tuned.

Feb 01 '14 18:02 HouzuoGuo

Any news on new range query types? I think ordering ASC/DESC by time.Time or timestamp will be a useful feature. int range queries seem a little inefficient?

Dec 04 '14 10:12 gibsonsyd

tiedot uses hash function to partition data, making range query fairly difficult to implement. integer-range lookup should be quite sufficient for some common usage scenarios. Till now I do not yet have a good idea about implementation of range query, sorry. It sure will be a nice thing to have.

Dec 04 '14 12:12 HouzuoGuo

I think the most important thing is about id. Incremental id, and automatically order by id, id range query like {id: {gte: '1234567', lte: '2345678'}}

Dec 19 '15 17:12 guileen

tiedot tiedot copied to clipboard

Range query support

tiedot
tiedot copied to clipboard