violet_rails icon indicating copy to clipboard operation
violet_rails copied to clipboard

transcript parser plugin

Open donrestarone opened this issue 3 years ago • 2 comments
trafficstars

given a JSON path that points to a string value (called the input), this plugin will scan the the associated API Resources of the API Namespace output the value to a different attribute (pointed at by another JSON path)

properties

input_string_path: "api_namespace_slug.some_property.another_property" output_string_path: "api_namespace_slug.some_property.a_different_property"

it should raise an error of the input path and output path are the same (indicates overwriting)

given a transcript like:

1
00:00:00,300 --> 00:00:02,167
[sally]: i'm in los angeles and

2
00:00:01,920 --> 00:00:02,102
[bobby]: okay

3
00:00:03,151 --> 00:00:04,737
[sally]: maybe possibly

4
00:00:05,780 --> 00:00:13,613
[bobby]: oh maybe ye so nice an is a
supporter financial supporter of hollows

5
00:00:13,345 --> 00:00:15,510
[sally]: s yeah

output a text corpus without segments, timestamps and [names]:

i'm in los angeles and okay maybe possibly oh maybe ye so nice an is a supporter financial supporter of hollowss yeah

donrestarone avatar Sep 21 '22 21:09 donrestarone

Do the metadata properties have to be JSON paths? Since this plugin will run against the API namespace that it's connected to, we can just specify the names of the input and output properties in the metadata.

metadata: {
  INPUT_STRING_PROPERTY: "raw_transcript",
  OUTPUT_STRING_PROPERTY: "transcript"
}

Ayon95 avatar Sep 22 '22 16:09 Ayon95

Do we need to add a boolean property to API resources to check if parsing is required?

Ayon95 avatar Sep 22 '22 17:09 Ayon95