crowsetta icon indicating copy to clipboard operation
crowsetta copied to clipboard

ENH: consider adding support for GUANO audio metadata parsing/writing

Open sammlapp opened this issue 1 year ago • 5 comments

Describe the solution you'd like Guano is a metadata convention used by the bat acoustics community

see spec https://github.com/riggsd/guano-spec/blob/master/guano_specification.md

and python package https://pypi.org/project/guano/

It has a nice set of values in the defaults, and allows "namespaces" with custom sets of fields as well.

Implementation-wise, it writes a separate section of WAV file (only wav is supported) similar to the default header, but it can be at the end for some reason. However, it seems it also supports I/O of text files and other formats.

sammlapp avatar Oct 10 '24 19:10 sammlapp

Thank you @sammlapp for suggesting this.

I have looked at GUANO before but hadn't added it.
Are you seeing a lot of usage of this format in the wild?

We are biased right now towards formats for annotating speech-like sequences of sounds like birdsong syllables; it seems like GUANO is at the other extreme where the goal is to provide as much relevant metadata as possible about a detection, using the term as it used in bioacoustics. (I know you know this, just adding context for anyone else who stumbles on the issue.)

It's not clear to me from the spec: is there a way to represent multiple detections within a single file?

Don't mean to grill you, I really appreciate your suggesting this -- I'm just hoping since you're an actual bioacoustician you might have more insight into how this format is being used in the wild

Some related discussion here: https://github.com/tdwg/ac/issues/264 and https://github.com/tdwg/ac/issues/247 (related in the sense that it provides context about how standards groups are thinking about GUANO)

NickleDave avatar Oct 10 '24 20:10 NickleDave

Also ... are you aware of any publicly available datasets that use this format?

I think I looked before and couldn't find any, another reason I didn't raise an issue about it.
Just asking since it would help to test that we can actually parse / write

NickleDave avatar Oct 10 '24 23:10 NickleDave

🤔 this at least says it was collected with Anabat (and Audiomoth): https://databank.illinois.edu/datasets/IDB-4200947

NickleDave avatar Oct 10 '24 23:10 NickleDave

To be honest, I wouldn't prioritize this as I don't see it being used a lot and haven't had a reason to need it. I opened the feature request because there was an open feature request about Guano on OpenSoundscape, but it seems like if integration is implemented anywhere it should be in Crowsetta rather than OpenSoundscape

sammlapp avatar Oct 11 '24 02:10 sammlapp

Got it, thank you @sammlapp. Happy to add it if you start seeing more of a need for it, looks fairly painless from the Python implementation

NickleDave avatar Oct 11 '24 12:10 NickleDave