sourmash icon indicating copy to clipboard operation
sourmash copied to clipboard

suggestions for mastiff

Open ctb opened this issue 2 years ago • 3 comments

A few thoughts on mastiff/SRA-search-as-a-service moving forward -

  • we are likely to want to provide multiple databases / versions of databases down the road - in particular, real-time search of GTDB and Genbank genomes would be ideal. would be good if mastiff can allow for this now, before we start getting regular users.
  • would be good to track heavy users and Web sites - not 100% sure how to do this. perhaps if we require API keys but make it easy to get them, then that would be good? otherwise we're going to be stuck figuring things out from referrer logs, and/or banning people who decide to overwhelm the server

note the "we" here is really @luizirber :). although the pressure to learn Rust continues to increase.

ctb avatar Sep 03 '22 15:09 ctb

we are likely to want to provide multiple databases / versions of databases down the road - in particular, real-time search of GTDB and Genbank genomes would be ideal. would be good if mastiff can allow for this now, before we start getting regular users.

I can version the API call (instead of /search be /v1/search), and more parameters can be passed in (like which DB/version to search). I should have done that for mastiff, but time crunch :upside_down_face: (wort already does that, and has the API described in the OpenAPI format).

would be good to track heavy users and Web sites - not 100% sure how to do this. perhaps if we require API keys but make it easy to get them, then that would be good? otherwise we're going to be stuck figuring things out from referrer logs, and/or banning people who decide to overwhelm the server

I was thinking about using some rate limiter in Caddy, and add logic to deal with 429 in the mastiff CLI client.

For monitoring I use datadog for wort because it is easy to deploy.

API keys are a good idea, but involve more info about the user being stored (create accounts and so on), which complicate the service quite a bit. So I would avoid that for now =]

luizirber avatar Sep 05 '22 15:09 luizirber

API keys are a good idea, but involve more info about the user being stored (create accounts and so on), which complicate the service quite a bit. So I would avoid that for now =]

agree, but on the contrapositive -

  • it helps figure out who is using the thing, which can help justify support
  • it helps identify people who are using old APIs, etc.
  • it helps prevent abuse (although so does making it really cheap to run, which is a better strategy)

maybe support or require a contact e-mail (or something) somewhere so that we can backtrack from logs?

I'm only suggesting this all because I've seen what happens when people run services that become popular :)

while I'm suggesting random features - would be cool to support a manifest-style output. although I think the current format does a fine job as a picklist so maybe we don't need it.

ctb avatar Sep 05 '22 16:09 ctb

oh, and a request for further reporting information - could we get the number of overlapping hashes in addition to the containment? :)

ctb avatar Sep 15 '22 17:09 ctb