pod5-file-format
pod5-file-format copied to clipboard
Efficient python writing
Hey,
I'm trying to figure out what the inteded best method of writing data to a pod5 file is.
The API has the vbz_compress_signal_chunked() step that does the compression, then you plug signal_chunks and signal_chunk_lengths into p5.CompressedRead() and give that to writer.add_read()
So something like this
signal_chunks, signal_chunk_lengths = vbz_compress_signal_chunked( read["signal"], DEFAULT_SIGNAL_CHUNK_SIZE)
writer.add_read(p5.CompressedRead(
read_id=uuid.UUID(read["read_id"]),
end_reason=d["end_reason"],
calibration=d["calibration"],
pore=d["pore"],
run_info=d["run_info_cache"][acq_id],
median_before=d["median_before"],
read_number=d["read_number"],
start_sample=d["start_sample"],
signal_chunks=signal_chunks,
signal_chunk_lengths=signal_chunk_lengths,
tracked_scaling=p5.pod5_types.ShiftScalePair(
read.get("tracked_scaling_shift", float("nan")),
read.get("tracked_scaling_scale", float("nan")),
),
predicted_scaling=p5.pod5_types.ShiftScalePair(
read.get("predicted_scaling_shift", float("nan")),
read.get("predicted_scaling_scale", float("nan")),
),
num_reads_since_mux_change=read.get("num_reads_since_mux_change", 0),
time_since_mux_change=read.get("time_since_mux_change", 0.0),
num_minknow_events=read.get("num_minknow_events", 0),
))
Is the user expected to do multiprocessing on this to get optimal write performance?
If so, is there some example or usage I can look at to see how this is intended to be done?
Cheers, James