listenbrainz-server Speedup stats processing in Spark cluster

Speedup stats processing in Spark cluster

Open amCap1712 opened this issue 9 months ago • 0 comments

Write a copy of the listens to HDFS on import of a full dump, this makes speeds up filtering of listens and increases the speed of processing in many cases.
Remove Pydantic validation in places where it seemed redundant or of not much use.

Before this PR, an entire stats run took about 9 hours. With step 2, it went down to 6.25 hours and then with step 1 on top of it, it goes down to 5.75 hours.

May 10 '24 11:05 amCap1712

listenbrainz-server listenbrainz-server copied to clipboard

Speedup stats processing in Spark cluster

listenbrainz-server
listenbrainz-server copied to clipboard