tiedot
tiedot copied to clipboard
Range query support
Currently tiedot only supports hash table index, it will be very helpful to add another index type to support range queries.
+1 I was going to ask this question. I'm a key-value db noobie, I thought there is way to do range query, turns out it is not supported yet :)
recently I am practicing Scala, I will start working on more tiedot stuff when I have bit more free time C:
btw tiedot isn't quite a key-value db, it is more like a conventional nosql db.
your hovering ability is cool.
well, you need enough lights and faster shutter:)
I'm not good at db stuff, and I'm looking for a embedded db for my small project. so whats your recommendation? tiedot or leveldb? (probably not the right man to ask:)
It probably depends on what your use case is and how big fan of Go you are. LevelDB has proven performance and reliability, while tiedot is a spare-time pet project (although it was made with utmost seriousness).
I need to store about 50000 image file's info, like filename, size, location, tag, time,etc. and query with tag or time etc. it's a practice project for me.
which language is it in?
in Go. I chose Go because it can setup a http server inside the program, no dependency issue, and seems fast. I want everything compact in my app, so it easy to deploy. I just realize LevelDB is not Go program, what I wanted to say was choice between leveldb-go or tiedot.
I was aware of leveldb's implementation in Go. Depends on your preference, leveldb is a key-value store, your data may be stored in these maps:
(filename => image), (image => size), (tag => image), (image => time).
If you chose to use tiedot, you may store entire image metadata in one document, similar to:
{"image": "~/png", "size": 1024, "tags": ["friend", "family"], "location: {"country": "CN"}}
And then put indexs on image
, tags
.
Two different paradigms, I think both of them should work for you.
tiedot's way looks good.
If all the images separate in different folders, and I want to list all the folder and image amount inside each folder, how should I implement it?
should I create another collection
or just insert another document
in the same collection that contains image info, document like:
{"folder":"path/to/dir", "amount":99}
The easiest way is...
find /path/to/dir -name '*.jpg' | wc -l
But if you prefer to think in NoSQL, see if this works: each document in collection library
represents an image; the document itself has file path information (let's make it absolute).
Now we want to count number of images (documents) in a path. Problem is that path is hierarchical so we have to figure out a way to index all information in an absolute path, therefore let us index all paths which lead to the image, and put them into a vector, for example, given image /home/howard/pix/1.jpg
, the document will look like:
{"dirs": ["/", "/home", "/home/howard", "/home/howard/pix"], "abspath": "/home/howard/pix/1.jpg"}
Put an index on dirs, and the image will appear in search result of dir eq /home
, dir eq /home/howard
, etc.
what I need is list all the folders that contain image, with image amount. not one given folder's image amount.
my app is about http server and image display, so I want to show all the folders and subfolders in one page. I can walk through folders with program, but I want to save the result for later usage.
How many concurrent users do you want to support?
If not many and your metadata collection isn't too big, then collection scan (the method above) may not be a bad idea.
But if you have hundreds of concurrent users and metadata collection is not sharded, then latency could lead to bad UX.
less than 10 users. I'll try it. thank you very much~
Hi, is there any chance you will add range query feature recently?
It seems I need this feature very much:) Otherwise, I don't know how to select data by time range. Iter them all and check time field manually?
Hello!
Recently I shifted my attention to Scala, check out my project "Schale". It has made its way to first release and now I can do some more Golang...
Range query support will definitely be the next major feature, together with new query syntax (the current query syntax is very ugly).
How granular are your time range queries (by month/day/hour)?
that's good news.
I need query by day , or may be by integer range.
It may take a little while to add range index support, but talking about range "query", given that your queries work with discrete integer values over a small range, how about I make a feature to do hash lookup over a range of values?
For example... to find photos taken in between February and May, it is merely a hash table lookup of month = 2,3,4 and 5.
That might helps too . In that case, my document format would be:
{"year": "2013", "month": 3, "day":"2", "tags": ["friend", "family"],...}
right?
this could be a temporary solution.
Sounds good. Let's go ahead and support this simple range query first.
Hey buddy.
The new query processor adepts the new range lookup feature together with totally re-designed syntax.
Please check out latest master branch and give API v2 a try by running tiedot with -mode=v2
.
I have not yet completed new API document, but here's a glimpse:
- Lookup
{"eq": "the_value", "in": ["path_segment1", "segment2"]}
- Value exists
{"has": ["path_segment1, "segment2"]}
- Get all docs
"all"
- Union
[query1, query2, etc]
- Intersect
{"n": [query1, query2, etc]}
- Complement
{"c": [query1, query2, etc]}
-
Range lookup
{"int-from": 1, "int-to": 12, "in": ["path_segment1", "path_segment2"]}
New syntax should be a lot more cleaner, and benchmark shows that new query processor is consistently 5% faster compare to the old one.
How does this look?
yes, It's better than v1 syntax. I'll try it out.
does this range lookup implemented as you said before, or it's the real range query already?
Yes. "range lookup" uses hash table and only supports integers.
Remember to add query result ordering options as well.
@HouzuoGuo How would I go about implementing reverse result ordering? I.e. get me the last 30 items inserted.
Also, I have a better ID generation method for you :) I'll submit a PR in the near future.
@kenkeiter Thank you very much, I look forward to it.
Result ordering has very limited support at the moment, and getting latest 30 docs cannot be easily done. We will introduce proper range index in the future, stay tuned.
Any news on new range query types? I think ordering ASC/DESC by time.Time or timestamp will be a useful feature. int range queries seem a little inefficient?
tiedot uses hash function to partition data, making range query fairly difficult to implement. integer-range lookup should be quite sufficient for some common usage scenarios. Till now I do not yet have a good idea about implementation of range query, sorry. It sure will be a nice thing to have.
I think the most important thing is about id. Incremental id, and automatically order by id, id range query like {id: {gte: '1234567', lte: '2345678'}}