byt5-geotagging
byt5-geotagging copied to clipboard
`[Challenge]` 12 months of data
The objective of this challenge is to train a deep learning model to identify the correlation between the time/date of post, the content, and the location. Time zones difference, as well as seasonality of the events, should be analyzed and used to predict the location.
For example: Snow is more likely to appear in the Northern Hemisphere, especially if in December. Rock concerts are more likely to happen in the evening and in bigger cities, so the time of the post about a concert should be used to identify the time zone of the author and narrow down the list of potential locations.
The data set provided is a:
- .json of >600.000 texts
- collected over the span of 12 months
- covering 15 different time zones
- 6 countries. (Cuba, Iran, Russia, North Korea, Syria, Venezuela).
The data set is here
Deliverable
- A model which takes a text on the input and returns the coordinates on the output
- Evaluation metrics obtained on the development dataset, including Mean Absolute Error in kilometers.
We will evaluate the model using the test dataset that is not shared here.
Additional notes
Contact us at [email protected] for any questions or additional requests.
Thank you for contributing to Open Source and making a difference! ʕ•́ᴥ•̀ʔ
is this challenge still open and currently looking for contributions???? @alinapark @ingakaspar
I see that this challenge is open, I will take a crack at it, and provide some updates as well. @alinapark