pygraphistry
pygraphistry copied to clipboard
[FEA] GFQL benchmark notebook and GPU examples
Is your feature request related to a problem? Please describe.
I added discussion of GFQL GPU support to the main readme and doc strings, but we should have some accessible ipynb's
Describe the solution you'd like
- [x] simple ipynb showing a before/after speedup of gfql (e.g., 4 very short & simple cells)
- mention using hop() for more speedups for simpler task
- clear 10X+ win on the benchmark
- show it works just by passing in cudf.DataFrame, and optionally, setting
engine='cudf'
- [x] also cleaned up version of our bigger benchmark one: https://colab.research.google.com/drive/1iuH9YWd3VLSALR-3z1Jt35MXELI3Sjk_#scrollTo=bK4C9Ly0hso-
- [x] linked in the gfql section(s) in the readme.md as appropriate
Additional context
Let's land cucat first..
-
started a simple notebook but need to flesh out w/ more explanation https://github.com/dcolinmorgan/grph/blob/main/simple_GFQL.ipynb
-
made loops of copy-paste 1,2,3,4 hop/chain. simple annotation to guide reader, could add more https://github.com/dcolinmorgan/grph/blob/main/clean_gfql_cpu_gpu_benchmark.ipynb
need scikit-learn<=1.3.2 for dirty-cat<=0.4.1 and cucat<=0.9.11
re: #534 can test/benchmark w/ larger GPU in colab
- added simple demo to gfql demo folder
- updated full benchmark with a version that loops vs so much copy-and-pasting so its hopefully easier to read