unsupervised_topic_segmentation icon indicating copy to clipboard operation
unsupervised_topic_segmentation copied to clipboard

Share code that takes the AMI data and formats it for your internal db please

Open KeithYJohnson opened this issue 2 years ago • 1 comments

Hey guys,

It'd be wonderful to have a snippet of whatever code transforms your source of the AMI dataset, and also that source. Did you start from https://huggingface.co/datasets/ami? Or maybe just unzipped and transformed https://groups.inf.ed.ac.uk/ami/download/? It looks like you computed sentence start times—ami just having segment/word level timings—but I've noticed some that the AMI test start has duplicate segment start times. So it'd be interesting to see how you handled such cases.

KeithYJohnson avatar Nov 23 '22 22:11 KeithYJohnson

I am not one of authors but after long search I found that you can use this dataset https://github.com/Yale-LILY/QMSum

BMukhtar avatar Feb 24 '23 14:02 BMukhtar