pod5-file-format icon indicating copy to clipboard operation
pod5-file-format copied to clipboard

Pyarrow error

Open Psy-Fer opened this issue 1 year ago • 10 comments

Hey George,

I have a user getting a strange error. I've attached the issue below, where you can also see some more context.

Any ideas what the issue here might be?

Cheers James


dear James,

thank you for the update, I just tried the newer version. I am getting an error related to the pyarrow package: trace

   len(batch.signal[batch_row_index].as_buffer()),
AttributeError: 'pyarrow.lib.LargeListScalar' object has no attribute 'as_buffer'

Originally posted by @lborcard in https://github.com/Psy-Fer/blue-crab/issues/12#issuecomment-2208307232

Psy-Fer avatar Jul 04 '24 10:07 Psy-Fer

Based on the error i suspect the file is uncompressed (and hitting an unaccounted for error)... I'm not sure how its possible to end up with an uncompressed file - how were the files created?

I'll keep digging on my side.

0x55555555 avatar Jul 04 '24 11:07 0x55555555

If may intervene, i am the user with the error. The pod5 files were generated using Icarust https://github.com/LooseLab/Icarust . They are compatible with dorado (I used it to basecall them).

lborcard avatar Jul 04 '24 12:07 lborcard

Ok, I'm not familiar with how Icarust writes pod5 files, but I've completed investigating in the pod5 source and found it is due to a bug with uncompressed pod5 files and the python pod5 bindings.

I have a fix internally that will resolve the issue, and I'll get it out asap.

  • George

0x55555555 avatar Jul 04 '24 12:07 0x55555555

This makes me ask the obvious question as well. Is pore_type still not used by nanopore software?

I was under the impression here that minknow had started using it. Is this something icarust has decided to use but is not actually a field used yet?

Psy-Fer avatar Jul 04 '24 12:07 Psy-Fer

Sequencing runs on the current MinKNOW software do not set the pore type no

0x55555555 avatar Jul 04 '24 12:07 0x55555555

Hmmm okay. Thanks.

Psy-Fer avatar Jul 04 '24 13:07 Psy-Fer

Ahh okay - @Psy-Fer I'm happy to change the Icarust code to set the Pore Type to "not-set" if that would be useful.

Adoni5 avatar Jul 04 '24 14:07 Adoni5

Please make it specifically not_set with an underscore to match that of the current pod5 output.

Feel free to use the test scripts in blue-crab as boilerplate to test if your files are correct.

I'll leave in the R10.4.1 exception to the pore_type so users of older versions of icarust can convert files if they like.

James

Psy-Fer avatar Jul 04 '24 14:07 Psy-Fer

I'm in the process of deploying 0.3.12, which contains a fix for the issue of opening raw data from uncompressed pod5 files.

Thanks,

  • George

0x55555555 avatar Jul 08 '24 09:07 0x55555555

Thanks George.

Adoni5 avatar Jul 08 '24 10:07 Adoni5

Hello,

I am getting a similar error to the original poster with an uncompressed pod5 file using pod5 version 0.3.23: POD5 has encountered an error: ''pyarrow.lib.ExtensionScalar' object has no attribute 'as_buffer'

Best,

Richard

richardheery avatar Nov 12 '25 15:11 richardheery