jams-data icon indicating copy to clipboard operation
jams-data copied to clipboard

Rewrite SALAMI parser to use raw data

Open bmcfee opened this issue 9 years ago • 4 comments

Forking from https://github.com/craffel/mir_eval/issues/162 ; parsing the parsed salami annotations could lead to errors. We should instead work on the raw version of the annotations.

I at one point had done this, but for the life of me can't find my implementation. As I recall, it was pretty nasty and should be rewritten anyway.

Basically, what one has to do is the following:

  1. Separate instrument labels (which have parentheses) from segment labels
  2. Induce segment intervals from the event boundary markers
  3. Partition segments by vocabulary for conversion.
  4. If we're daring, also transfer the instrument annotations by matching parentheses.

1 and 2 should be easy. 3 I think can be easily achieved by a clever use of the JAMS namespace structure for each annotation, and a cunning use of pandas.

4 is tricky since you sometimes see open- and close-parens on the same event, and we'll need a namespace for the instruments.

bmcfee avatar Feb 03 '16 13:02 bmcfee

Will work on this over the weekend. On Wed, Feb 3, 2016 at 5:27 AM Brian McFee [email protected] wrote:

Assigned #9 https://github.com/marl/jams-data/issues/9 to @urinieto https://github.com/urinieto.

— Reply to this email directly or view it on GitHub https://github.com/marl/jams-data/issues/9#event-537532666.

urinieto avatar Feb 03 '16 16:02 urinieto

I finished points 1, 2 and 3. I don't have the cycles for tackling 4 right now (maybe we can do some hacking in Dagstuhl @bmcfee ?).

Let me know if you want me to do a PR of these changes or you'd prefer waiting for 4.

urinieto avatar Feb 09 '16 06:02 urinieto

Sure, start the PR as a WIP and we can finish up point 4 later.

bmcfee avatar Feb 09 '16 13:02 bmcfee

Alright!

urinieto avatar Feb 09 '16 16:02 urinieto