gpt-2-simple
gpt-2-simple copied to clipboard
How to make gpt2 simple generate custom text based on some structured data ?
Hi I am trying to work on a practical application of GPT2 to generate text commentary based on some structured data. For Example, for the following weather data , there is a text associated with the numerical data that describes the weather in simple english like Showers late. Mostly cloudy. and Isolated tstorms late. Morning clouds. . Let's assume this can be completely derived from the given numerical data, how to finetune GPT2 so that if we give the input like 30 / 22 °C | 30 °C | 17 km/h | ↑ | 54% | 57% | 0.9 mm | -2147483648 (Low) | 5:52 | 18:44 and it spits out => Maximum temperature is 30 Degree, with partial clouds .
Sample Data
Day | Temperature | Feels Like | Wind | Humidity | Chance | Amount | UV | Sunrise | Sunset | Weather Details | |
---|---|---|---|---|---|---|---|---|---|---|---|
Mon | 30 / 22 °C | 30 / 22 °C | 30 °C | 17 km/h | ↑ | 54% | 57% | 0.9 mm | -2147483648 (Low) | 5:52 | 18:44 |
30 / 22 °C | |||||||||||
8-Jun | |||||||||||
Tue | 30 / 21 °C | 30 / 21 °C | 30 °C | 16 km/h | ↑ | 53% | 57% | 1.5 mm | -2147483648 (Low) | 5:52 | 18:45 |
30 / 21 °C | |||||||||||
9-Jun | |||||||||||
Wed | 30 / 22 °C | 30 / 22 °C | 30 °C | 16 km/h | ↑ | 52% | 57% | 2.2 mm | -2147483648 (Low) | 5:53 | 18:45 |
30 / 22 °C | |||||||||||
10-Jun | |||||||||||
Thu | 28 / 22 °C | 28 / 22 °C | 28 °C | 22 km/h | ↑ | 63% | 62% | 5.2 mm | -2147483648 (Low) | 5:53 | 18:45 |
28 / 22 °C | |||||||||||
11-Jun | |||||||||||
Fri | 26 / 22 °C | 26 / 22 °C | 26 °C | 22 km/h | ↑ | 68% | 98% | 3.4 mm | -2147483648 (Low) | 5:53 | 18:46 |
26 / 22 °C | |||||||||||
12-Jun |
This advice is anecdotal and not technical, but in my experiments I've learned that simply giving GPT-2 enough data with consistent formatting in a plain old .txt file is enough to make it learn to produce the desired results. You can do this with inline tags; separate the ordered data and result samples with [DATA] and [RESULTS] and GPT-2 will do a pretty good job of keeping everything in place, and even correlating the data. Because GPT-2 is forward-predictive, make sure that you take that into account when it decides to generate new data.
Was there any more headway made on the above question? This article takes a very elaborate approach to it, but I am also looking for something that is more similar to the question posed above https://medium.com/analytics-vidhya/natural-language-generation-from-structured-data-9d70f3f224af