Chart2Text Confusions regarding the "Postprocessing after generation" step

Confusions regarding the "Postprocessing after generation" step

Open sunnyville opened this issue 3 years ago • 1 comments

Hi,

So I have successfully completed the "Generation" with my own data that looked like the following:

testData.txt:

Company|pepsi|x|bar_chart TweetCount|3250|y|bar_chart Company|CocaCola|x|bar_chart TweetCount|2535|y|bar_chart Company|drpepper|x|bar_chart TweetCount|2064|y|bar_chart Company|redbull|x|bar_chart TweetCount|1242|y|bar_chart Company|MountainDew|x|bar_chart TweetCount|1177|y|bar_chart Company|Sprite|x|bar_chart TweetCount|634|y|bar_chart Company|fanta|x|bar_chart TweetCount|157|y|bar_chart Company|7UP|x|bar_chart TweetCount|124|y|bar_chart

testTitle.txt:

Number of tweets distribution from 2021-04-07 until 2021-11-05

Following were the contents of the templateOutput-p80.txt file that resulted after the Generation step using the test data shown above:

The statistic shows the templateTitle[4] in the templateTitle[1] with the highest templateTitle[2] gaming revenues in templateTitleDate[0] . The location with the highest templateTitle[3] templateYLabel[0] in the templateYLabel[2] was unsurprisingly the templateXValue[0] Strip , templateXValue[0] , totaling templateYValue[max] templateScale templateYLabel[2] templateYLabel[3] in templateTitle[2] gaming templateYLabel[0] in templateTitleDate[0] .

Then I copied this pasted it into data/test/testSummary.txt file, after deleting what was in it before. Then I ran the following command:

python etc/summaryComparison.py

Finally, following was the result I got that was saved to the generated-p80.txt file:

The statistic shows the 2021-04-07 in the tweets with the highest distribution gaming revenues in .  The location with the highest from TweetCount in the TweetCount was unsurprisingly the pepsi Strip , pepsi totaling 3250 % TweetCount in distribution gaming TweetCount in .

This kind of looks like the expected result; however, I'm not too sure. Moreover, I also got few errors while running the summaryComparison.py script above. This is what it looked like: screen I'm not able to figure out what these errors are.

Also, I noticed that there is a testSummaryLabel.txt file with 0s and 1s for each data point. What is the significance of this file and should/how can I create one for my data above?

Thanks.

Nov 30 '21 04:11 sunnyville

Hi,

Yeah that output seems correct.

Those errors are expected, they occur if a predicted template doesn't exist. For example, templateYLabel[2] can't be substituted since the y label only has one token "TweetCount", so it was outputting the error "ylabel index error at 2".

The TestSummaryLabel.txt was used during training, so unless you plan on re-training the model you can ignore it. The 1s represent data points that are found in the chart's original summary, and 0s represent the opposite.

Nov 30 '21 13:11 JasonObeid

Chart2Text Chart2Text copied to clipboard

Confusions regarding the "Postprocessing after generation" step

Chart2Text
Chart2Text copied to clipboard