tiedot icon indicating copy to clipboard operation
tiedot copied to clipboard

Range query support

Open HouzuoGuo opened this issue 11 years ago • 29 comments

Currently tiedot only supports hash table index, it will be very helpful to add another index type to support range queries.

HouzuoGuo avatar Jun 28 '13 10:06 HouzuoGuo

+1 I was going to ask this question. I'm a key-value db noobie, I thought there is way to do range query, turns out it is not supported yet :)

ifq avatar Aug 08 '13 09:08 ifq

recently I am practicing Scala, I will start working on more tiedot stuff when I have bit more free time C:

HouzuoGuo avatar Aug 08 '13 09:08 HouzuoGuo

btw tiedot isn't quite a key-value db, it is more like a conventional nosql db.

your hovering ability is cool.

HouzuoGuo avatar Aug 08 '13 09:08 HouzuoGuo

well, you need enough lights and faster shutter:)

I'm not good at db stuff, and I'm looking for a embedded db for my small project. so whats your recommendation? tiedot or leveldb? (probably not the right man to ask:)

ifq avatar Aug 08 '13 09:08 ifq

It probably depends on what your use case is and how big fan of Go you are. LevelDB has proven performance and reliability, while tiedot is a spare-time pet project (although it was made with utmost seriousness).

HouzuoGuo avatar Aug 08 '13 10:08 HouzuoGuo

I need to store about 50000 image file's info, like filename, size, location, tag, time,etc. and query with tag or time etc. it's a practice project for me.

ifq avatar Aug 08 '13 10:08 ifq

which language is it in?

HouzuoGuo avatar Aug 08 '13 10:08 HouzuoGuo

in Go. I chose Go because it can setup a http server inside the program, no dependency issue, and seems fast. I want everything compact in my app, so it easy to deploy. I just realize LevelDB is not Go program, what I wanted to say was choice between leveldb-go or tiedot.

ifq avatar Aug 08 '13 10:08 ifq

I was aware of leveldb's implementation in Go. Depends on your preference, leveldb is a key-value store, your data may be stored in these maps:

(filename => image), (image => size), (tag => image), (image => time).

If you chose to use tiedot, you may store entire image metadata in one document, similar to:

{"image": "~/png", "size": 1024, "tags": ["friend", "family"], "location: {"country": "CN"}}

And then put indexs on image, tags.

Two different paradigms, I think both of them should work for you.

HouzuoGuo avatar Aug 08 '13 10:08 HouzuoGuo

tiedot's way looks good. If all the images separate in different folders, and I want to list all the folder and image amount inside each folder, how should I implement it? should I create another collection or just insert another document in the same collection that contains image info, document like:

 {"folder":"path/to/dir", "amount":99}

ifq avatar Aug 08 '13 11:08 ifq

The easiest way is...

find /path/to/dir -name '*.jpg' | wc -l

But if you prefer to think in NoSQL, see if this works: each document in collection library represents an image; the document itself has file path information (let's make it absolute).

Now we want to count number of images (documents) in a path. Problem is that path is hierarchical so we have to figure out a way to index all information in an absolute path, therefore let us index all paths which lead to the image, and put them into a vector, for example, given image /home/howard/pix/1.jpg, the document will look like:

{"dirs": ["/", "/home", "/home/howard", "/home/howard/pix"], "abspath": "/home/howard/pix/1.jpg"}

Put an index on dirs, and the image will appear in search result of dir eq /home, dir eq /home/howard, etc.

HouzuoGuo avatar Aug 08 '13 11:08 HouzuoGuo

what I need is list all the folders that contain image, with image amount. not one given folder's image amount.

my app is about http server and image display, so I want to show all the folders and subfolders in one page. I can walk through folders with program, but I want to save the result for later usage.

ifq avatar Aug 08 '13 11:08 ifq

How many concurrent users do you want to support?

If not many and your metadata collection isn't too big, then collection scan (the method above) may not be a bad idea.

But if you have hundreds of concurrent users and metadata collection is not sharded, then latency could lead to bad UX.

HouzuoGuo avatar Aug 08 '13 23:08 HouzuoGuo

less than 10 users. I'll try it. thank you very much~

ifq avatar Aug 09 '13 02:08 ifq

Hi, is there any chance you will add range query feature recently?

It seems I need this feature very much:) Otherwise, I don't know how to select data by time range. Iter them all and check time field manually?

ifq avatar Aug 13 '13 11:08 ifq

Hello!

Recently I shifted my attention to Scala, check out my project "Schale". It has made its way to first release and now I can do some more Golang...

Range query support will definitely be the next major feature, together with new query syntax (the current query syntax is very ugly).

How granular are your time range queries (by month/day/hour)?

HouzuoGuo avatar Aug 13 '13 23:08 HouzuoGuo

that's good news.

I need query by day , or may be by integer range.

ifq avatar Aug 14 '13 03:08 ifq

It may take a little while to add range index support, but talking about range "query", given that your queries work with discrete integer values over a small range, how about I make a feature to do hash lookup over a range of values?

For example... to find photos taken in between February and May, it is merely a hash table lookup of month = 2,3,4 and 5.

HouzuoGuo avatar Aug 14 '13 03:08 HouzuoGuo

That might helps too . In that case, my document format would be:

 {"year": "2013", "month": 3, "day":"2", "tags": ["friend", "family"],...}

right?

this could be a temporary solution.

ifq avatar Aug 14 '13 03:08 ifq

Sounds good. Let's go ahead and support this simple range query first.

HouzuoGuo avatar Aug 14 '13 05:08 HouzuoGuo

Hey buddy.

The new query processor adepts the new range lookup feature together with totally re-designed syntax.

Please check out latest master branch and give API v2 a try by running tiedot with -mode=v2.

I have not yet completed new API document, but here's a glimpse:

  • Lookup {"eq": "the_value", "in": ["path_segment1", "segment2"]}
  • Value exists {"has": ["path_segment1, "segment2"]}
  • Get all docs "all"
  • Union [query1, query2, etc]
  • Intersect {"n": [query1, query2, etc]}
  • Complement {"c": [query1, query2, etc]}
  • Range lookup {"int-from": 1, "int-to": 12, "in": ["path_segment1", "path_segment2"]}

New syntax should be a lot more cleaner, and benchmark shows that new query processor is consistently 5% faster compare to the old one.

How does this look?

HouzuoGuo avatar Aug 16 '13 11:08 HouzuoGuo

yes, It's better than v1 syntax. I'll try it out.

does this range lookup implemented as you said before, or it's the real range query already?

ifq avatar Aug 18 '13 12:08 ifq

Yes. "range lookup" uses hash table and only supports integers.

HouzuoGuo avatar Aug 18 '13 22:08 HouzuoGuo

Remember to add query result ordering options as well.

HouzuoGuo avatar Jan 30 '14 19:01 HouzuoGuo

@HouzuoGuo How would I go about implementing reverse result ordering? I.e. get me the last 30 items inserted.

Also, I have a better ID generation method for you :) I'll submit a PR in the near future.

kenkeiter avatar Feb 01 '14 16:02 kenkeiter

@kenkeiter Thank you very much, I look forward to it.

Result ordering has very limited support at the moment, and getting latest 30 docs cannot be easily done. We will introduce proper range index in the future, stay tuned.

HouzuoGuo avatar Feb 01 '14 18:02 HouzuoGuo

Any news on new range query types? I think ordering ASC/DESC by time.Time or timestamp will be a useful feature. int range queries seem a little inefficient?

gibsonsyd avatar Dec 04 '14 10:12 gibsonsyd

tiedot uses hash function to partition data, making range query fairly difficult to implement. integer-range lookup should be quite sufficient for some common usage scenarios. Till now I do not yet have a good idea about implementation of range query, sorry. It sure will be a nice thing to have.

HouzuoGuo avatar Dec 04 '14 12:12 HouzuoGuo

I think the most important thing is about id. Incremental id, and automatically order by id, id range query like {id: {gte: '1234567', lte: '2345678'}}

guileen avatar Dec 19 '15 17:12 guileen