byt5-geotagging icon indicating copy to clipboard operation
byt5-geotagging copied to clipboard

`[Challenge]` Metadata and Clusters

Open alinapark opened this issue 2 years ago • 5 comments

The objective of this challenge is to train a deep learning model to predict coordinates or cluster regions coordinates of texts while improving on Yachay’s original infrastructure.

We offer an annotated dataset for training and testing, comprising texts and their region cluster IDs, coordinates, post metadata, and more. We recommend considering the post metadata field, but you are free to exclude/include any of the provided dataset fields if it leads to improved validation metrics on your end. Regression, classification, multi-task or else - all solutions and suggestions are welcome!

Yachay team will evaluate the model using the test dataset that is not shared here.

Note: metadata and clusters issue-challenge allows for a higher number/variety of experiments. No hard MSE or EER requirements, we're looking for innovative ideas for infrastructure development.

The provided dataset is here, which:

  • annotated corpus of ~600k+ texts, with respective regions (clusters), timestamps and over 40k user_id-s
  • a median number of 415 texts per region (cluster)
  • each user has at least 6 texts
  • an additional list of cluster_ids with coordinates of the cluster for mapping texts to coordinates.

As for the deliverables, we looking for:

  • a model which takes a text on the input and returns the coordinates on the output
  • evaluation metrics obtained on the development dataset, including Mean Absolute Error in Haversine Distance

Send a Pull Request with your results, comment here for questions, or ping on Discord for requests!

Thank you for contributing to Open Source and making a difference! ʕ•́ᴥ•̀ʔ

alinapark avatar Jul 12 '23 04:07 alinapark

is this challenge still open and currently looking for contributions???? @alinapark @ingakaspar

AnuravModak avatar Aug 19 '23 14:08 AnuravModak

@AnuravModak it is, as long as the issue is here

alinapark avatar Nov 03 '23 21:11 alinapark