byt5-geotagging
byt5-geotagging copied to clipboard
`[Challenge]` Metadata and Clusters
The objective of this challenge is to train a deep learning model to predict coordinates or cluster regions coordinates of texts while improving on Yachay’s original infrastructure.
We offer an annotated dataset for training and testing, comprising texts and their region cluster IDs, coordinates, post metadata, and more. We recommend considering the post metadata field, but you are free to exclude/include any of the provided dataset fields if it leads to improved validation metrics on your end. Regression, classification, multi-task or else - all solutions and suggestions are welcome!
Yachay team will evaluate the model using the test dataset that is not shared here.
Note:
metadata and clustersissue-challenge allows for a higher number/variety of experiments. No hard MSE or EER requirements, we're looking for innovative ideas for infrastructure development.
The provided dataset is here, which:
- annotated corpus of ~600k+ texts, with respective regions (clusters), timestamps and over 40k user_id-s
- a median number of 415 texts per region (cluster)
- each user has at least 6 texts
- an additional list of cluster_ids with coordinates of the cluster for mapping texts to coordinates.
As for the deliverables, we looking for:
- a model which takes a text on the input and returns the coordinates on the output
- evaluation metrics obtained on the development dataset, including Mean Absolute Error in Haversine Distance
Send a Pull Request with your results, comment here for questions, or ping on Discord for requests!
Thank you for contributing to Open Source and making a difference! ʕ•́ᴥ•̀ʔ
is this challenge still open and currently looking for contributions???? @alinapark @ingakaspar
@AnuravModak it is, as long as the issue is here