PowerSystems.jl icon indicating copy to clipboard operation
PowerSystems.jl copied to clipboard

Support large set of MarketBidCost

Open amirmm11 opened this issue 1 year ago • 7 comments

Adding many MarketBidCost objects (or TimeSeries) takes a long time to store HDF5 files. It would be great to improve the performance, e.g., using multi-threading. Thanks

amirmm11 avatar Mar 13 '24 19:03 amirmm11

@amirmm11 Can you share some details about your environment and observed performance?

  • What is the backing storage for your filesystem? SSD, spinning disk, network filesystem, etc.
  • Do you have an estimate of the throughput you are seeing in MB/s?
  • Are you enabling compression when you create the system? If so, are you customizing any compression attributes?

It is possible to parallelize the writing, but it would only be beneficial if you aren’t already saturating the storage.

Also, this wouldn’t be as simple as using multi-threading. It would require MPI as discussed here. It might be easier to support multiple files and write those files with different threads. We could consider that.

daniel-thom avatar Mar 13 '24 20:03 daniel-thom

@daniel-thom I am using Google Cloud c2d-highcpu-32 VM

but here are some info

  • Backing Storage Type
$ lsblk -o NAME,TYPE,SIZE,ROTA
NAME    TYPE    SIZE ROTA
loop0   loop   39.1M    1
loop1   loop  105.8M    1
loop2   loop   55.7M    1
loop3   loop   55.7M    1
loop4   loop     87M    1
loop5   loop   63.9M    1
loop6   loop     37M    1
loop7   loop    352M    1
loop8   loop   63.9M    1
loop9   loop  105.4M    1
loop10  loop     87M    1
loop11  loop  353.6M    1
loop12  loop   40.4M    1
loop13  loop    4.2M    1
sda     disk      1T    1
├─sda1  part 1023.9G    1
├─sda14 part      4M    1
└─sda15 part    106M    1
  • Throughput Estimate
    • Write speed test
dd if=/dev/zero of=tempfile bs=1M count=1024 conv=fdatasync,notrunc status=progress
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 5.10065 s, 211 MB/s
  • Read speed test
dd if=tempfile of=/dev/null bs=1M count=1024 status=progress
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.114648 s, 9.4 GB/s
  • I am using the default value of enable_compression

amirmm11 avatar Mar 13 '24 22:03 amirmm11

BTW what about something like this?

using HDF5
using Base.Threads

# Function to write some data to an HDF5 file
function write_to_hdf5(file_name, data)
    h5open(file_name, "w") do file
        write(file, "data", data)
    end
end

# Sample data to write
data_samples = [rand(10, 10) for _ in 1:nthreads()] # Generate random matrices

# Main multithreading execution
@threads for i in 1:nthreads()
    # Each thread writes to a different file
    file_name = "thread_$(i)_data.h5"
    write_to_hdf5(file_name, data_samples[i])
    println("Written by thread $i to $file_name")
end

amirmm11 avatar Mar 13 '24 22:03 amirmm11

Can you tell if you are getting ~200 MB/s when you write the time series data? That isn’t super fast, but before proceeding, I want to make sure we are debugging the correct problem. If PowerSystems is writing at a significantly slower speed than the single-threaded system max, we need to look at that.

Can you run the dd test in parallel to see how much you would benefit from parallel writes?

Regarding your multi-threaded example, yes, that is what I was referring to above. The only problem is that it would cause a non-trivial change to our management of these files. It’s obviously not super-complicated, but it would take some work. If we were to go down that path, I would consider more radical changes, such as always storing each time array in a single Arrow file (or some other binary format that is not HDF5).

daniel-thom avatar Mar 13 '24 22:03 daniel-thom

I couldn't get much out of parallel dd test.

#!/bin/bash

num_operations=16
tempfile_prefix="tempfile"
total_speed=0

for i in $(seq 1 $num_operations); do
  dd if=/dev/zero of=${tempfile_prefix}${i} bs=1M count=1024 conv=fdatasync,notrunc status=progress 2>speed_${i}.txt &
  pid[$i]=$!
done

wait

for i in $(seq 1 $num_operations); do
  wait ${pid[$i]}
  operation_speed=$(grep -o '[0-9.]\+ MB/s' speed_${i}.txt | tail -1 | awk '{print $1}')
  echo "Operation $i speed: ${operation_speed} MB/s"
  total_speed=$(echo "$total_speed + $operation_speed" | bc)
  rm -f ${tempfile_prefix}${i} speed_${i}.txt
done

echo "Total throughput: $total_speed MB/s"
echo "All dd write operations have completed."
Total throughput: 266.0 MB/s

amirmm11 avatar Mar 13 '24 23:03 amirmm11

I ran my own experiment with 1 million additions of small time arrays (288 floats per array). It is very slow. You can compare the difference with System(; time_series_in_memory = true). In that case there would be no writes to HDF5. It is orders of magnitude faster. Also, nothing is saved to file for later use.

I have some ideas to fix this, but have other priorities at the moment (no multi-threading needed). I’ll get back to you in about a week. If you don’t need to persist the data to files, you can use the in-memory option for now.

daniel-thom avatar Mar 14 '24 00:03 daniel-thom

Yes, System(; time_series_in_memory = true) is a good option for me since most of the cloud VMs have abundant CPUs and Memory.

amirmm11 avatar Mar 14 '24 00:03 amirmm11

I will close this issue and the fix will be part of 4.0 release

jd-lara avatar Mar 19 '24 18:03 jd-lara