btcmap.org icon indicating copy to clipboard operation
btcmap.org copied to clipboard

Revise Community/Country Ranking Algorithm

Open dadofsambonzuki opened this issue 6 months ago • 8 comments

Currently, the Web App implements its own ranking system which is defined here by this simple algo:

const score = (report: Report): number => {
		return Math.max(report.tags.total_elements - report.tags.outdated_elements * 5, 0);
	};

There is previous discussion here, although it's now a little outdated and has some inaccurate info in.

The criticisms of the current algo are:

  1. It doesn't account for area size (either population or km^2).

  2. It weights by up-to-date locations as a percentage, not the absolute number.

  3. It includes 'noisy' ATM elements.

It seems like the separation of relative quality (which is handled in the Grade star rating anyways), the use of per capita weighting and the exclusion of ATMs would address these concerns.

So the algo would simply be:

(report.up_to_date_merchants) / area.population

Considerations:

  1. Currently, report.up_to_date_merchants is not in reports. If we crudely use (report.up_to_date_merchants - total_atms) / area.population the algo could return a negative number if there is a large number of ATMs and a small number of verified locations. So we either:

    a) Add up_to_date_merchants' to the API reports.

    b) Accept we could get a negative score using (report.up_to_date_merchants - total_atms) / area.population .

    c) Include ATMs and just have report.up_to_date_elements / area.population

    My preference is a) > c) > b). We could start with c) and then move to a) when the API is ready.

  2. Handle absence of area.population. Should result in a null score so we can address the area data issue.

People can use the recently added sorting to rank on absolute numbers if they want. The Grade star rating becomes a pure measure of overall data quality.

Lest we forget Trending Communities/Countries 🫠

dadofsambonzuki avatar Jun 09 '25 12:06 dadofsambonzuki

  1. Currently, report.up_to_date_merchants is not in reports

It is:

https://api.btcmap.org/v2/reports/153215

{
  "id": 153215,
  "area_id": "breizh-bitcoin",
  "date": "2024-01-29",
  "tags": {
    "avg_verification_date": "2023-06-14T02:00:00.000000000Z",
    "legacy_elements": 0,
    "outdated_elements": 3,
    "total_atms": 0,
    "total_elements": 13,
    "total_elements_lightning": 9,
    "total_elements_lightning_contactless": 5,
    "total_elements_onchain": 11,
    "up_to_date_elements": 10,
    "up_to_date_percent": 76
  },
  "created_at": "2024-01-29T00:00:01.505Z",
  "updated_at": "2024-01-29T00:00:01.505Z",
  "deleted_at": ""
}

bubelov avatar Jun 10 '25 03:06 bubelov

It is:

https://api.btcmap.org/v2/reports/153215

I make a distinction between elements and merchants; the former including ATMs and the later not. Whenever I present stats at conferences I exclude ATMs, but this currently get tricky if also assessing up-to-datedness.

The API deals with elements and atms and there is no way to get up_to_date_merchants without either having that tag directly or specifically having up_to_date_atms so clients can work out up_to_date_merchants themselves.

dadofsambonzuki avatar Jun 10 '25 07:06 dadofsambonzuki

Everything on OSM is an element, we just need to live with that fact.

I'd avoid inventing a new category of "merchants" since it's tricky to define. I'm pretty sure we can find plenty of other non-merchants besides ATMs if we look deep enough, so it's more like a matter of taste.

Since we don't know what is a merchant but we do know what it an ATM, it makes sense to add another field called up_to_date_atms or something, which can be supplementary to existing total_atms, so we can avoid creating new arbitrary categories.

up_to_date_elements - up_to_date_atms should give you the number of up to date elements which arent ATMs, would it address your issue?

bubelov avatar Jun 10 '25 11:06 bubelov

Yep, that's a nice and clean option d)!

The count includes money changers too right as per the recent API PR?

dadofsambonzuki avatar Jun 10 '25 12:06 dadofsambonzuki

let atms: Vec<_> = elements
    .iter()
    .filter(|it| it.overpass_data.tag("amenity") == "atm")
    .collect();

No, it only includes OSM-sourced places with an amenity tag set to atm, those are proper OSM ATM's, our category tag diverges from that but I approved that PR because it's only used in the web app and the category tag is on the way out anyway.

Is there a good English word to describe both ATMs and exchange kiosks? If we have to define something new just for the purpose of processing reports, it's better to define an exception instead or redefining the main thing. Calling something which is not an ATM an ATM is too hacky and adding another field called up_to_date_bureau_de_changes also feels excessive.

Another concern is this thing being contentious. What if a book shop with 1,000 locations starts accepting bitcoins? We will have a bunch of guys screaming that our algo is unfair because they don't like books and therefore book shops are less valuable than McDonald's.

bubelov avatar Jun 11 '25 04:06 bubelov

I don't see a good generic solution without introducing a diversity multiplier, nothing else would address your concern number 3:

It includes 'noisy' ATM elements.

Too much of anything can be noisy, it doesn't apply to ATMs and exchange kiosks alone.

I'm afraid we'll need to define fiat stuff like basket of goods, etc, and try to model what an average pleb consumes, assigning different weights to different things =)

bubelov avatar Jun 11 '25 04:06 bubelov

Another important thing everyone in that discussion needs to understand is the fact that reports were never intended for leaderboard, and leaderboard is just a thing web app decided to implement (no other client apps show that stuff since it's broken in many ways). Reports are just some counts used to draw charts.

We can simply move the score function to the server and come up with a number in a range 1..100, for example. Server doesn't only have counts, it has actual elements so it can interrogate them and extract more context.

If you have the elements themselves, many things can be done. We can simply exclude all elements of the most common category X once they exceed twice the number of elements of the second most common category Y, or something like that.

bubelov avatar Jun 11 '25 05:06 bubelov

Is there a good English word to describe both ATMs and exchange kiosks? If we have to define something new just for the purpose of processing reports, it's better to define an exception instead or redefining the main thing. Calling something which is not an ATM an ATM is too hacky and adding another field called up_to_date_bureau_de_changes also feels excessive.

Agree. Let's just stick to ATMs for now for this.

We can leave the wider category discussion to this thread.

Too much of anything can be noisy, it doesn't apply to ATMs and exchange kiosks alone.

Agreed. But ATMs - particularly in the US (our largest userbase) - are an annoyance now. We can stay flexible here. e.g. 4M Square merchants would be noise!

We can simply move the score function to the server and come up with a number in a range 1..100, for example. Server doesn't only have counts, it has actual elements so it can interrogate them and extract more context.

I would much prefer this and have been of that opinion for years.

If the API gen generate a normalised score between 0 and 1 based on (up_to_date_elements - up_to_date_atms) / area.population then that would be great. Bonus points for adding that to each report so that can chart that over time.

That means we can change the algo as the areas mature in terms of adoption and still have that 0-1 normalised score based on what we deemed as being 'good' at that time.

I like this!

dadofsambonzuki avatar Jun 11 '25 15:06 dadofsambonzuki