msaf-data icon indicating copy to clipboard operation
msaf-data copied to clipboard

Automatize the metadata parsing

Open urinieto opened this issue 9 years ago • 10 comments

We should have a global file containing all the metadata in order to avoid having to manually add it after parsing the svl files. Thus, the parser could add it automatically, instead of destroying it (like in PR #2).

urinieto avatar Jun 16 '16 04:06 urinieto

Ok, the parser now includes the metadata information, extracted from this new tsv file. However, after I used @bmcfee 's notebook to correct the annotations, I get a series of errors for some files (eg, the famous SALAMI_114) and it doesn't really include any multisegment annotations for the ones that fail.

@bmcfee , did you do anything else to the JAMS files after using the annotations parser and before using your notebook to obtain the references you included in #2?

urinieto avatar Jun 16 '16 06:06 urinieto

I get a series of errors for some files (eg, the famous SALAMI_114) and it doesn't really include any multisegment annotations for the ones that fail.

That's correct. The notebook does not produce multi-segment annotations if the upper and lower annotations cannot be resolved.

did you do anything else

nope!

bmcfee avatar Jun 16 '16 13:06 bmcfee

So, how come the files you uploaded and fail contain multi_segment annotations (e.g. SALAMI_114 from your last commit) and mine don't (e.g., the same from my last commit)?

urinieto avatar Jun 16 '16 16:06 urinieto

O_o really? do you have the most recent version of the notebook?

bmcfee avatar Jun 16 '16 17:06 bmcfee

Yeah, I just tried it again to confirm with this version and I get the same results. Not sure what's going on... can you try it yourself? This is what I did from the SPAM/scripts folder:

python parse_annotations.py ../original_references out_dir ../metadata.tsv

And then run the notebook on the out_dir folder.

urinieto avatar Jun 17 '16 02:06 urinieto

So, how come the files you uploaded and fail contain multi_segment annotations

Derp, sorry, I was unclear before. Failure is determined on an annotator-basis, not track basis. I still want the multi-level annotations for the annotators that can be resolved together, even for 114. There should be four multi-level annotations in 114, five in all other tracks.

bmcfee avatar Jun 17 '16 13:06 bmcfee

Yeah, I get that. But the notebook doesn't do that, right? How did you compute that in your last commit? I'm trying to automatize the whole thing On Fri, Jun 17, 2016 at 6:56 AM Brian McFee [email protected] wrote:

So, how come the files you uploaded and fail contain multi_segment annotations

Derp, sorry, I was unclear before. Failure is determined on an annotator-basis, not track basis. I still want the multi-level annotations for the annotators that can be resolved together, even for 114. There should be four multi-level annotations in 114, five in all other tracks.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/urinieto/msaf-data/issues/5#issuecomment-226775335, or mute the thread https://github.com/notifications/unsubscribe/ADhisa9W3-7nbjm7ODxH7CL57KTK2YWwks5qMqeXgaJpZM4I3BcR .

urinieto avatar Jun 17 '16 15:06 urinieto

Yeah, I get that. But the notebook doesn't do that, right? How did you compute that in your last commit?

Doesn't it? My last commit was done just as you describe: run the parser script and then the notebook.

I'm trying to automatize the whole thing

I think the best way to do this is to restructure the parser so that it collects all the annotations for a track first and then writes out the results. Doing it piecemeal and operating in place makes it really difficult to keep synchronized.

bmcfee avatar Jun 17 '16 15:06 bmcfee

... if you're already in the process of automating this step, how about taking a crack at the salami annotations as well?

I tried running the notebook over them, and they're a total mess. Maybe this is due to parse errors from the upstream annotations?

bmcfee avatar Jun 17 '16 19:06 bmcfee

... if you're already in the process of automating this step, how about taking a crack at the salami annotations as well?

I haven't started yet, since I still don't know how you actually computed the multi_segment. Can you try it again with the latest version of the parser + notebook and see if we get the same results? Once I get the same results you got, I will redo the parser such that it collects all the annotations for a track first.

I tried running the notebook over them, and they're a total mess. Maybe this is due to parse errors from the upstream annotations?

Ugh... no idea, haven't tried the SALAMI ones

urinieto avatar Jun 17 '16 20:06 urinieto