unsupervised_topic_segmentation
unsupervised_topic_segmentation copied to clipboard
Share code that takes the AMI data and formats it for your internal db please
Hey guys,
It'd be wonderful to have a snippet of whatever code transforms your source of the AMI dataset, and also that source. Did you start from https://huggingface.co/datasets/ami? Or maybe just unzipped and transformed https://groups.inf.ed.ac.uk/ami/download/? It looks like you computed sentence start times—ami just having segment/word level timings—but I've noticed some that the AMI test start has duplicate segment start times. So it'd be interesting to see how you handled such cases.
I am not one of authors but after long search I found that you can use this dataset https://github.com/Yale-LILY/QMSum