community icon indicating copy to clipboard operation
community copied to clipboard

Dask Demo Day - Sign up issue

Open ncclementi opened this issue 2 years ago • 74 comments

Next Demo Day: December 5th

See what the Dask community has been up to, or share some Dask work of your own. Demos are short and informal (~5-10 minutes). Have something you'd like to share? Leave a comment and let us know.

Meetings are held on the first Thursday of each month at 11am US Eastern on Zoom: https://dask.org/meeting-room

Subscribe to the Dask event calendar to be notified of changes:

Past Demo Days

Meetings are recorded and shared on the Dask YouTube channel.

Oct 3rd, 2024

  • @rjzamora "pin and persist"

  • Sept 5th, 2024 youtube recording

  • August 1st, 2024

  • June 6th, 2024

  • March 21st, 2024 youtube recording

    • @fjetter Dask DataFrame improvements
    • @mrocklin Large scale population of vector databases for RAG
    • @jrbourbeau Easy GPU access with Coiled
  • February 15th, 2024 youtube recording

    • @mrocklin One trillion row challenge
    • @jacobtomlinson deploying Dask on Databricks with dask-databricks
    • @jrbourbeau deploying Prefect workflows on the cloud with Coiled
    • @quasiben scaling embedding pipelines (LlamaIndex + Dask)
    • @ntabris using AWS Cost Explorer to see the cost of public IPv4 addresses
  • January 18th, 2024 youtube recording

    • @cisaacstern (and maybe @jacobtomlinson) Apache Beam DaskRunner
    • @mrocklin Array expressions
    • @scharlottej13 one billion row challenge
  • October 19th, 2023 youtube recording

    • @mrocklin TPC-H benchmarks for Spark, Dask, Polars, DuckDB
    • @mrchtr Fondant
    • @jhamman Dask <> Arraylake integration
    • @jacobtomlinson "Who uses RAPIDS?"
  • September 21st, 2023 youtube recording

    • @fjetter Performance with P2P array rechunking
    • @phofl Dask expressions
    • @scharlottej13 + @dcherian Processing a quarter petabyte geospatial dataset in the cloud
  • August 17th, 2023 youtube recording

    • @fjetter -- distributed + memray integration for memory profiling
    • @mrocklin -- coiled setup CLI
    • @jrbourbeau -- Make Dask and Earthaccess work well together
  • July 20th, 2023 youtube recording

    • @hendrikmakait Shuffle resilience
    • @Matt711 : Dask-Kubernetes update
    • @GueroudjiAmal external tasks in Dask distributed (https://github.com/GueroudjiAmal/distributed)
    • @skrawcz Dask <> Hamilton integration
  • June 15th, 2023 youtube recording

    • dask-geopandas demo by @martinfleis
    • Fine performance dask metrics @crusaderky
    • Gil monitoring on dask @milesgranger
  • May 18th, 2023 youtube recording

    • Dask Expressions - @rjzamora
    • deltadask - @MrPowers
    • ydata-profiling - @miriamspsantos / @fabclmnt
    • Dask-bigquery - @ncclementi /@j-bennet
  • April 20th, 2023 -youtube recording

    • dask-awkward and dask-histogram for high energy physics analysis @lgray (10 min)
    • daskqueue : a dask-based distributed task queue. @AmineDiro (5-7min)
    • Pyarrow strings in Dask DataFrames - @jrbourbeau (5-7 min)
    • Launching a Jupyter/Dask cluster on NVIDIA Base Command Platform - @jacobtomlinson (5-7min)
  • March 16th, 2023 - youtube recording

    • Analysing Terabytes of Ocean Simulation model output with Xarray, xgcm and xhistogram @TomNicholas
    • P2P shuffling @hendrikmakait
    • Scaling weather radar data analysis with Dask @mgrover1
    • Automatic package synchronization in Coiled Dask Clusters @dchudz
    • Graph Neural Networks training with Dask @VibhuJawa

ncclementi avatar Feb 16 '23 15:02 ncclementi

  • @brian-methodical Can we get some folks from your team to sign up for one of the upcoming dates?

  • @mgrover1 Do you think the 03/16 date would work for you? Do you want to drop a comment with a proposed date and title?

  • @TomNicholas Whenever you can drop a comment on what date would be more convenient for you, that would be great.

  • @raybellwaves @jorisvandenbossche @tastatham @martinfleis Can we get a demo (or two) that involves dask-geopandas? Or can you volunteer someone :) ?

ncclementi avatar Feb 21 '23 16:02 ncclementi

@ncclementi I could do a short demo of dask-geopandas in May but happy to leave the place to others if they're interested.

martinfleis avatar Feb 21 '23 16:02 martinfleis

@martinfleis I'll take you up for May, if you know of folks that would like to show their applications on any of the dates, please send them this issue. It'll be great to hear more about how dask-geopandas is being used.

ncclementi avatar Feb 21 '23 16:02 ncclementi

I think 16th March is good for me!

TomNicholas avatar Feb 21 '23 17:02 TomNicholas

@TomNicholas Thanks Tom 🙌 , do you have a title that would summarize your demo? If not yet, we can set it later

ncclementi avatar Feb 21 '23 18:02 ncclementi

Something like "Analysing Terabytes of Ocean Simulation model output with Xarray, xgcm and xhistogram"

TomNicholas avatar Feb 22 '23 17:02 TomNicholas

@ncclementi I'd like to demo Coiled's "package sync" feature (should be interesting even for folks not planning to use Coiled).

dchudz avatar Feb 28 '23 21:02 dchudz

Yes to 3/16! Still trying to come up with a title, but it will be related to scaling weather radar data analysis with dask!

mgrover1 avatar Mar 01 '23 20:03 mgrover1

Thanks @mgrover1 and @dchudz We'll have you on the list! 🎉

ncclementi avatar Mar 01 '23 20:03 ncclementi

@martindurant and @douglasdavis would you be interested in presenting at the April upcoming session?

ncclementi avatar Mar 21 '23 21:03 ncclementi

Hi @ncclementi - I'm using dask-awkward and dask-histogram for high energy physics analysis and can give some real-world examples of how we'd use @martindurant and @douglasdavis's work. I think I can fit that all into 10 minutes. I'm available for April 20th.

lgray avatar Mar 31 '23 14:03 lgray

@lgray That would be awesome! We usually do 5-7 min but since we do not have folks for April yet, I think 10 min will work, plus you will be showcasing two dask projects! I'll write you down for April 20th then!!

ncclementi avatar Mar 31 '23 16:03 ncclementi

Hello @ncclementi , I built daskqueue, a lightweight Distributed Task Queue library built on top of Dask. Daskqueue also implements persistent queues for holding tasks on disk and surviving Dask cluster restart. I would love to present this library according to the format of your choosing and April 20th sounds excellent for me.

The title of the talk would be daskqueue : a dask-based distributed task queue.

AmineDiro avatar Apr 04 '23 16:04 AmineDiro

We can take May Demo, I’ll bring Dan from my MLOps team to this weeks meeting to intro and get involved

brian-methodical avatar Apr 04 '23 16:04 brian-methodical

@AmineDiro Thank you, that sounds awesome. I'll set you on the list for April 20th, regarding the format.

These are 5-7 minute demos that show off ongoing or lesser-known work. We hope to have 3-5 of these during the meeting. Meetings will be recorded and advertised on social. Hopefully, this helps to educate folks on some of the great work people are up to.

I just updated this issue with a link to the meeting for clarity. I will eventually create an April issue specific.

ncclementi avatar Apr 04 '23 17:04 ncclementi

We can take May Demo, I’ll bring Dan from my MLOps team to this weeks meeting to intro and get involved

RE https://github.com/dandawg

brian-methodical avatar Apr 06 '23 15:04 brian-methodical

@danwang if you can drop a title for your may demo that would be great.

ncclementi avatar Apr 06 '23 15:04 ncclementi

@danwang if you can drop a title for your may demo that would be great.

Thanks @ncclementi . I don't have a title yet--I need to narrow my topic area a little more. I'll commit to having one though by next week.

dandawg avatar Apr 06 '23 15:04 dandawg

@ncclementi if you're still looking for another demo for the April Demo Day I'd love to do a quick demo of launching a Jupyter/Dask cluster on NVIDIA Base Command Platform which comes with all the GPU stuff configured.

jacobtomlinson avatar Apr 11 '23 16:04 jacobtomlinson

@jacobtomlinson That would be great! I'll add you to the list.

ncclementi avatar Apr 11 '23 16:04 ncclementi

I'm talking to @fab_clemente here from https://github.com/ydataai/ydata-profiling , a dataframe profiling library (gives summary statistics and more like https://ydata-profiling.ydata.ai/examples/master/census/census_report.html) that works well with Pandas and Spark. It might be an interesting demo to try to see if there is interest in a Dask integration.

@ncclementi what do you think?

mrocklin avatar Apr 28 '23 18:04 mrocklin

@fabclmnt would you be up for demoing according to the format described above on May 18th 11am EDT?

ncclementi avatar May 01 '23 22:05 ncclementi

@rjzamora want to give a demo on dask-expr?

mrocklin avatar May 03 '23 15:05 mrocklin

I owe an update here. My topic is still in dev, and needs some more exploring. Client work has gotten in the way a bit. I'll attend the community meeting tomorrow, and I can give an update on what I'm thinking about, and maybe get some feedback.

dandawg avatar May 03 '23 22:05 dandawg

@martinfleis Are you still on for the May Dask Demo Day? Do you have a title/topic?

ncclementi avatar May 04 '23 15:05 ncclementi

@ncclementi I apparently messed up time zones (again!) and am supposed to speak at another event at the same time. @jorisvandenbossche @jsignell any chance you'd be willing to take over? If not we may need to reschedule for another month. Sorry for complications!

martinfleis avatar May 04 '23 18:05 martinfleis

I can't make that Demo Day unfortunately.

jsignell avatar May 05 '23 13:05 jsignell

@rjzamora want to give a demo on dask-expr?

Yes - I can walk through a high-level dask-expr demo :)

rjzamora avatar May 05 '23 15:05 rjzamora

@martinfleis Can I move you to the June Day?

ncclementi avatar May 05 '23 15:05 ncclementi

@ncclementi yes. And sorry!

martinfleis avatar May 05 '23 15:05 martinfleis