ws4sqlite icon indicating copy to clipboard operation
ws4sqlite copied to clipboard

ws4duck?

Open mskyttner opened this issue 2 years ago • 10 comments

There is a 🦆 figuring here so I wonder if this nice web service would need considerable refactoring to work with the sqlite-like database duckdb?

There is a go binding and also a C++ REST server component... and if I understand correctly, SQLite databases can also be attached.

Would be interesting to hear your thoughts on this.

mskyttner avatar Apr 29 '22 08:04 mskyttner

Haha 😅 the duck is there because my first name in my language also means a mallard, so it's sort of a trademark. I am not familiar with duckdb, admittedly; it could be interesting. I'll take a look. Thanks!

proofrock avatar Apr 29 '22 09:04 proofrock

First thing, the go interface needs CGO; it's very doable of course, but it limits the cross-compilation possibilities. For example, I would need a macos/arm64 and a macos/intel to be able to provide binaries for both... and I don't have a macos/arm64... yet. I just got rid of CGO last week for ws4sqlite, to run in the same thorn bush is something I wouldn't want to do. But in general it seems feasible.

A word: how do in your opinion SQLite and DuckDB compare? Different use cases, or "just" two solutions for the same problem? Sorry for the stupid question, but if DuckDB is tailor-made for a specific use case, then it could be possible that a REST interface isn't useful for that.

proofrock avatar Apr 29 '22 09:04 proofrock

For reading and analytical queries, duckdb is fast. And often there is more reading than writing at least in the use cases I mostly work with, such as data analytics. But also I think in general.

There are examples of duckdb being able to be quite resource effective, such as requiring just one single machine to do the equivalence of what needs 32 spark worker nodes using Apache Spark (source: this talk here).

And I like how it can work smoothly with new and old data formats, CSV-files as well as large parquet files on S3 or Arrow data. Also the R and Python integrations are great.

So, I prefer using duckdb for most of my use cases, but for use cases with a lot write-heavy applications (OLTP) with many concurrent writes all the time, I think sqlite (for in-process) or postgres is probably more suitable. Even so, when looking at that data though, I'd personally prefer to use duckdb for fast queries against it. The write-intensive transactional use case might even be the more specialized use case for most applications, perhaps?

I experimented with putting some HTTP/JSON web services on top of duckdb databases here and even cheated with a http bash server directly calling the duckdb CLI to return JSON (so beware if you look at that!). When I found ws4sqlite, I thought it looked very nice and lightweight for using when "dockerizing" a database file to expose it as a msa web service.

Just quacking my few cents :duck:! :)

mskyttner avatar Apr 29 '22 10:04 mskyttner

I am quite satisfied of the on the air protocol (so to speak) and the security provisions aren't too dependent on the database after all. So it shouldn't be too difficult to switch engine... and clients would be already there. When I'll have some time I'll try. Thanks again!

proofrock avatar Apr 29 '22 10:04 proofrock

I will not do this, at least for now. The problem is, the build of marcboeker/go-duckdb uses CGO, and ws4sqlite migrated away from that (to improve cross-compilation). I confirm the possibility of forking ws4sqlite and integrating duckdb without many problems; I will leave this issue open if someone is interested. Thanks!

proofrock avatar Nov 28 '22 07:11 proofrock

@mskyttner After a loooong time (but it never left my mind for some reason) here it is:

https://github.com/proofrock/duckrg

It took a rewrite to rust (sqliterg) and then a port to duckdb, much easier because the drivers for it are modeled on the ones for sqlite. Lucky.

Let me know if you like it ;-)

proofrock avatar Oct 09 '23 17:10 proofrock

@proofrock Wow, so very nice!

I got excited about this and tried to fiddle with getting a container build to work in a fork, in order to test it, and ran into some quirks with the alpine / cross-build / static binary builds ... but I was able to get a (quite fat) container image in the end that seemed to be able to use duckdb extensions too (nice!); details are here... and I tried using this GitHub Action to push the container to GHCR:

Testing it a little bit I was thinking of the storedStatement which is nice, and of something similar, like a storedCommand which could take parameters something along the lines of this (broken quotes, think "^C1")...

{
    "transaction": [
        {
            "statement": "install 'httpfs';"
	},
        {
            "statement": "load 'httpfs';"
	},
	{
            "query": "from read_json_auto('https://api.github.com/users/proofrock/events?per_page=100') select type, count(*) as event_count group by 1 order by 2 desc limit 10"
        },
	{
	    "command": "curl -sL 'https://api.github.com/users/##_username_##/events?per_page=100' \
    | duckdb -s 'COPY (SELECT type, count(*) AS event_count FROM read_json_auto('/dev/stdin') GROUP BY 1 ORDER BY 2 DESC LIMIT 10) TO '/dev/stdout' WITH (FORMAT 'csv', HEADER)' \
    | uplot bar -d, -H -t 'GitHub Events for @##_username_##'",
	    "values": { "username": "proofrock" }
	}
    ]
}

if one knows that the server side have these commands available that can be chained together, it could be nice to be able to make use of them... perhaps a crazy idea :) ... for getting "text/plain" result like the one below from the API...

docker run --rm -v $(pwd)/gh_youplot.sh:/tm
p/gh_youplot.sh ghcr.io/mskyttner/duckrg /tmp/gh_youplot.sh
                           GitHub Events for @dproofrock
                     ┌                                        ┐ 
           PushEvent ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 51.0   
         CreateEvent ┤■■■■■■■■■■■■■■■ 22.0                      
         DeleteEvent ┤■■■■■■■■■■■ 17.0                          
        ReleaseEvent ┤■■■ 4.0                                   
         IssuesEvent ┤■ 2.0                                     
          WatchEvent ┤■ 2.0                                     
    PullRequestEvent ┤■ 1.0                                     
   IssueCommentEvent ┤■ 1.0                                     
                     └                                        ┘ 

mskyttner avatar Oct 10 '23 20:10 mskyttner

@mskyttner thanks for all this. Do you have time/interest to move your last message to duckrg's discussion section? Also, if you want and can, would you mind to contribute a PR for a "sane" dockerfile (ubuntu or debian-based is ok, maybe without the extra stuff)? If not, I can adapt yours and give proper attribution, of course, but a PR would give a better visibility. Thanks for taking the time to tinker with build scripts, I just ported them from sqliterg but didn't have time to properly check them.

For the rest, cool ideas. I'll give you answers over at the discussion, or if you can't open there I'll follow up here.

Thanks again!

proofrock avatar Oct 12 '23 10:10 proofrock

Sure, a pleasure, I moved that idea to https://github.com/proofrock/duckrg/discussions/1 and will attempt a PR for a Dockerfile (starting with a x64-amd variant).

mskyttner avatar Oct 12 '23 14:10 mskyttner

Hi Markus! Please take a look at https://github.com/proofrock/ws4sqlite/discussions/44 and contribute if you want.

proofrock avatar Feb 15 '24 07:02 proofrock