Chart2Text
Chart2Text copied to clipboard
Confusions regarding the "Postprocessing after generation" step
Hi,
So I have successfully completed the "Generation" with my own data that looked like the following:
testData.txt
:
Company|pepsi|x|bar_chart TweetCount|3250|y|bar_chart Company|CocaCola|x|bar_chart TweetCount|2535|y|bar_chart Company|drpepper|x|bar_chart TweetCount|2064|y|bar_chart Company|redbull|x|bar_chart TweetCount|1242|y|bar_chart Company|MountainDew|x|bar_chart TweetCount|1177|y|bar_chart Company|Sprite|x|bar_chart TweetCount|634|y|bar_chart Company|fanta|x|bar_chart TweetCount|157|y|bar_chart Company|7UP|x|bar_chart TweetCount|124|y|bar_chart
testTitle.txt
:
Number of tweets distribution from 2021-04-07 until 2021-11-05
Following were the contents of the templateOutput-p80.txt
file that resulted after the Generation step using the test data shown above:
The statistic shows the templateTitle[4] in the templateTitle[1] with the highest templateTitle[2] gaming revenues in templateTitleDate[0] . The location with the highest templateTitle[3] templateYLabel[0] in the templateYLabel[2] was unsurprisingly the templateXValue[0] Strip , templateXValue[0] , totaling templateYValue[max] templateScale templateYLabel[2] templateYLabel[3] in templateTitle[2] gaming templateYLabel[0] in templateTitleDate[0] .
Then I copied this pasted it into data/test/testSummary.txt
file, after deleting what was in it before. Then I ran the following command:
python etc/summaryComparison.py
Finally, following was the result I got that was saved to the generated-p80.txt
file:
The statistic shows the 2021-04-07 in the tweets with the highest distribution gaming revenues in . The location with the highest from TweetCount in the TweetCount was unsurprisingly the pepsi Strip , pepsi totaling 3250 % TweetCount in distribution gaming TweetCount in .
This kind of looks like the expected result; however, I'm not too sure. Moreover, I also got few errors while running the summaryComparison.py
script above. This is what it looked like:
I'm not able to figure out what these errors are.
Also, I noticed that there is a testSummaryLabel.txt
file with 0s
and 1s
for each data point. What is the significance of this file and should/how can I create one for my data above?
Thanks.
Hi,
Yeah that output seems correct.
Those errors are expected, they occur if a predicted template doesn't exist. For example, templateYLabel[2] can't be substituted since the y label only has one token "TweetCount", so it was outputting the error "ylabel index error at 2".
The TestSummaryLabel.txt was used during training, so unless you plan on re-training the model you can ignore it. The 1s represent data points that are found in the chart's original summary, and 0s represent the opposite.