adsharma
adsharma
https://github.com/ParAlg/ParClusterers https://arxiv.org/pdf/2411.10290 Has a survey of graph databases, their implementation of these algorithms and how they compare to parallel algorithms implemented in C++ libraries with Cilk/OpenMP.
The ParClusters repo linked above has a bunch of good parallel algorithms. They compare to implementations in Neo4j, memgraph and a few other graph databases. However it had build problems...
random: ``` kuzu> match (a:Node)-[b:Relates]-(c:Node) where a.id=100 return b; ┌────────────────────────────────────────────────────┐ │ b │ │ REL │ ├────────────────────────────────────────────────────┤ │ (0:100)-{_LABEL: Relates, _ID: 1:12838}->(0:467) │ │ (0:100)-{_LABEL: Relates, _ID: 1:22748}->(0:7029) │ │...
@ray6080 no difference in kuzu, but parquet export shows the delta: ``` du -sh *export 364K locality_export 500K random_export ``` ``` du -sh *compressed; 4.1M locality_distrib_compressed 4.1M locality_distrib_uncompressed 4.1M random_distrib_compressed...
I repeated the experiment with a larger dataset. TL;DR: when imported into duckdb, locality is smaller than random as expected. But with kuzu, it's the other way around random_parq.py https://gist.github.com/adsharma/0f47c379db3ba1dae2078d24ec41f4e5...
Thank you for looking into this! After applying the two changes above: * Drop the `Relates` string property * Use THREADS=1 I do see an improvement: ``` du -sh local_kuzu*...
All these numbers are with SF=0.1 (data size = 30MB). With data size >> DRAM, perhaps a different set of db size related optimization strategies such as #5050 start kicking...
pgvector-v0.8.0 wheels here https://github.com/adsharma/pgserver/actions/runs/17194383449/job/48774327676
https://github.com/adsharma/pgserver/actions/runs/17193571691 has wheels for python-3.13 for x64. No luck with aarch64 yet.
Implementing this will also help a python -> mojo transpiler, which could assist in implementing python stdlib in mojo via automated code translation.