byt5-geotagging icon indicating copy to clipboard operation
byt5-geotagging copied to clipboard

`[Challenge]` 12 months of data

Open alinapark opened this issue 2 years ago • 3 comments

The objective of this challenge is to train a deep learning model to identify the correlation between the time/date of post, the content, and the location. Time zones difference, as well as seasonality of the events, should be analyzed and used to predict the location.

For example: Snow is more likely to appear in the Northern Hemisphere, especially if in December. Rock concerts are more likely to happen in the evening and in bigger cities, so the time of the post about a concert should be used to identify the time zone of the author and narrow down the list of potential locations.

The data set provided is a:

  • .json of >600.000 texts
  • collected over the span of 12 months
  • covering 15 different time zones
  • 6 countries. (Cuba, Iran, Russia, North Korea, Syria, Venezuela).

The data set is here

Deliverable

  • A model which takes a text on the input and returns the coordinates on the output
  • Evaluation metrics obtained on the development dataset, including Mean Absolute Error in kilometers.

We will evaluate the model using the test dataset that is not shared here.

Additional notes

Contact us at [email protected] for any questions or additional requests.

Thank you for contributing to Open Source and making a difference! ʕ•́ᴥ•̀ʔ

alinapark avatar Dec 14 '22 01:12 alinapark

is this challenge still open and currently looking for contributions???? @alinapark @ingakaspar

AnuravModak avatar Aug 19 '23 14:08 AnuravModak

I see that this challenge is open, I will take a crack at it, and provide some updates as well. @alinapark

smore88 avatar Dec 14 '23 23:12 smore88