Efficient python writing

Open Psy-Fer opened this issue 5 months ago • 0 comments

Hey,

I'm trying to figure out what the inteded best method of writing data to a pod5 file is.

The API has the vbz_compress_signal_chunked() step that does the compression, then you plug signal_chunks and signal_chunk_lengths into p5.CompressedRead() and give that to writer.add_read()

So something like this

signal_chunks, signal_chunk_lengths = vbz_compress_signal_chunked( read["signal"], DEFAULT_SIGNAL_CHUNK_SIZE)

writer.add_read(p5.CompressedRead(
    read_id=uuid.UUID(read["read_id"]),
    end_reason=d["end_reason"],
    calibration=d["calibration"],
    pore=d["pore"],
    run_info=d["run_info_cache"][acq_id],
    median_before=d["median_before"],
    read_number=d["read_number"],
    start_sample=d["start_sample"],
    signal_chunks=signal_chunks,
    signal_chunk_lengths=signal_chunk_lengths,
    tracked_scaling=p5.pod5_types.ShiftScalePair(
        read.get("tracked_scaling_shift", float("nan")),
        read.get("tracked_scaling_scale", float("nan")),
    ),
    predicted_scaling=p5.pod5_types.ShiftScalePair(
        read.get("predicted_scaling_shift", float("nan")),
        read.get("predicted_scaling_scale", float("nan")),
    ),
    num_reads_since_mux_change=read.get("num_reads_since_mux_change", 0),
    time_since_mux_change=read.get("time_since_mux_change", 0.0),
    num_minknow_events=read.get("num_minknow_events", 0),
))

Is the user expected to do multiprocessing on this to get optimal write performance?

If so, is there some example or usage I can look at to see how this is intended to be done?

Cheers, James

Jul 10 '25 09:07 Psy-Fer