PowerSystems.jl
PowerSystems.jl copied to clipboard
Support large set of MarketBidCost
Adding many MarketBidCost objects (or TimeSeries) takes a long time to store HDF5 files. It would be great to improve the performance, e.g., using multi-threading. Thanks
@amirmm11 Can you share some details about your environment and observed performance?
- What is the backing storage for your filesystem? SSD, spinning disk, network filesystem, etc.
- Do you have an estimate of the throughput you are seeing in MB/s?
- Are you enabling compression when you create the system? If so, are you customizing any compression attributes?
It is possible to parallelize the writing, but it would only be beneficial if you aren’t already saturating the storage.
Also, this wouldn’t be as simple as using multi-threading. It would require MPI as discussed here. It might be easier to support multiple files and write those files with different threads. We could consider that.
@daniel-thom I am using Google Cloud c2d-highcpu-32 VM
but here are some info
- Backing Storage Type
$ lsblk -o NAME,TYPE,SIZE,ROTA
NAME TYPE SIZE ROTA
loop0 loop 39.1M 1
loop1 loop 105.8M 1
loop2 loop 55.7M 1
loop3 loop 55.7M 1
loop4 loop 87M 1
loop5 loop 63.9M 1
loop6 loop 37M 1
loop7 loop 352M 1
loop8 loop 63.9M 1
loop9 loop 105.4M 1
loop10 loop 87M 1
loop11 loop 353.6M 1
loop12 loop 40.4M 1
loop13 loop 4.2M 1
sda disk 1T 1
├─sda1 part 1023.9G 1
├─sda14 part 4M 1
└─sda15 part 106M 1
- Throughput Estimate
- Write speed test
dd if=/dev/zero of=tempfile bs=1M count=1024 conv=fdatasync,notrunc status=progress
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 5.10065 s, 211 MB/s
- Read speed test
dd if=tempfile of=/dev/null bs=1M count=1024 status=progress
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.114648 s, 9.4 GB/s
- I am using the default value of
enable_compression
BTW what about something like this?
using HDF5
using Base.Threads
# Function to write some data to an HDF5 file
function write_to_hdf5(file_name, data)
h5open(file_name, "w") do file
write(file, "data", data)
end
end
# Sample data to write
data_samples = [rand(10, 10) for _ in 1:nthreads()] # Generate random matrices
# Main multithreading execution
@threads for i in 1:nthreads()
# Each thread writes to a different file
file_name = "thread_$(i)_data.h5"
write_to_hdf5(file_name, data_samples[i])
println("Written by thread $i to $file_name")
end
Can you tell if you are getting ~200 MB/s when you write the time series data? That isn’t super fast, but before proceeding, I want to make sure we are debugging the correct problem. If PowerSystems is writing at a significantly slower speed than the single-threaded system max, we need to look at that.
Can you run the dd test in parallel to see how much you would benefit from parallel writes?
Regarding your multi-threaded example, yes, that is what I was referring to above. The only problem is that it would cause a non-trivial change to our management of these files. It’s obviously not super-complicated, but it would take some work. If we were to go down that path, I would consider more radical changes, such as always storing each time array in a single Arrow file (or some other binary format that is not HDF5).
I couldn't get much out of parallel dd test.
#!/bin/bash
num_operations=16
tempfile_prefix="tempfile"
total_speed=0
for i in $(seq 1 $num_operations); do
dd if=/dev/zero of=${tempfile_prefix}${i} bs=1M count=1024 conv=fdatasync,notrunc status=progress 2>speed_${i}.txt &
pid[$i]=$!
done
wait
for i in $(seq 1 $num_operations); do
wait ${pid[$i]}
operation_speed=$(grep -o '[0-9.]\+ MB/s' speed_${i}.txt | tail -1 | awk '{print $1}')
echo "Operation $i speed: ${operation_speed} MB/s"
total_speed=$(echo "$total_speed + $operation_speed" | bc)
rm -f ${tempfile_prefix}${i} speed_${i}.txt
done
echo "Total throughput: $total_speed MB/s"
echo "All dd write operations have completed."
Total throughput: 266.0 MB/s
I ran my own experiment with 1 million additions of small time arrays (288 floats per array). It is very slow. You can compare the difference with System(; time_series_in_memory = true). In that case there would be no writes to HDF5. It is orders of magnitude faster. Also, nothing is saved to file for later use.
I have some ideas to fix this, but have other priorities at the moment (no multi-threading needed). I’ll get back to you in about a week. If you don’t need to persist the data to files, you can use the in-memory option for now.
Yes, System(; time_series_in_memory = true) is a good option for me since most of the cloud VMs have abundant CPUs and Memory.
I will close this issue and the fix will be part of 4.0 release