pod5-file-format icon indicating copy to clipboard operation
pod5-file-format copied to clipboard

Metadata editing in pod5 files

Open DCossey opened this issue 2 years ago • 10 comments

Hi, we were running into an error during dorado basecalling (of recovered files after a failed run) due to incorrect sequencing kit:

[2023-12-15 14:35:04.358] [error] Unknown sequencing_kit: FLO-PRO114M

So then we checked our pod5 files and saw the following:

flow_cell_product_code: FLO-PRO114M sequencing_kit: FLO-PRO114M

Is it possible to edit the incorrect sequencing kit somehow?

DCossey avatar Dec 15 '23 20:12 DCossey

Hi @DCossey , Yes it is possible to fix your metadata although it's not particularly clean as pod5 files are immutable.

There is a short part of the documentation relating to this, but here's a snippet more tailored to your issue. You need to edit the RunInfo of each read.

import pod5

# New output file for edited data
with pod5.Writer("output.pod5") as writer:
    # Read all records
    with pod5.Reader("input.pod5") as reader:
          # Iterate over immutable ReadRecords
          for record in reader:
               # Convert to mutable Read
               read = record.to_read()
               # Edit the value
               read.run_info.sequencing_kit = "sequencing_kit_here"
               # Write the edited read
               writer.add_read(read)

Kind regards, Rich

HalfPhoton avatar Dec 18 '23 10:12 HalfPhoton

I followed the example code and revised a pod5 file successfully. However, when I try to check the content using pod5 view, I get this error (the same error if I try to view an untouched pod5):

POD5 has encountered an error: 'Error while processing "output2.pod5''

For detailed information set POD5_DEBUG=1'

jennieli421 avatar Dec 18 '23 15:12 jennieli421

What command are you running?

HalfPhoton avatar Dec 18 '23 15:12 HalfPhoton

pod5 view "output2.pod5"

jennieli421 avatar Dec 18 '23 15:12 jennieli421

Can you try without the quotes please?

HalfPhoton avatar Dec 18 '23 15:12 HalfPhoton

Tried and still the same error.

$ pod5 view original.pod5
read_id	filename	read_number	channel	mux	end_reason	start_time	start_sample	duration	num_samples	minknow_events	sample_rate	median_before	predicted_scaling_scale	predicted_scaling_shift	tracked_scaling_scale	tracked_scaling_shift	num_reads_since_mux_change	time_since_mux_change	run_id	sample_id	experiment_id	flow_cell_id	pore_type

POD5 has encountered an error: 'Error while processing 'original.pod5''

For detailed information set POD5_DEBUG=1'

jennieli421 avatar Dec 18 '23 15:12 jennieli421

Can you run the following then:

pod5 --version
POD5_DEBUG=1 pod5 view output2.pod5

And then share the contents of the pod5 .log files that are generated?

HalfPhoton avatar Dec 18 '23 15:12 HalfPhoton

ah - this could be an new issue from polars==0.20

Can you please ensure you're using polars==0.19

If not please re-install polars with pip install -U polars~=0.19

HalfPhoton avatar Dec 18 '23 15:12 HalfPhoton

Yes my polars==0.20. Note that I have to run pip install -U polars==0.19, otherwise it would say "requirement already satisfied". The error was fixed. Thanks!

jennieli421 avatar Dec 18 '23 16:12 jennieli421

Fantastic, sorry about that last issue - we're patching this as we speak

HalfPhoton avatar Dec 18 '23 16:12 HalfPhoton