api icon indicating copy to clipboard operation
api copied to clipboard

Pulling a specific subset of PGCRs in a non-abusive way

Open Java-12-Aero opened this issue 3 years ago • 2 comments

I'm trying to collect information on Trials of Osiris matches as the basis for a personal study into trends in matches over time, and getting a significant portion of the matches by my method of going over individual clans would require significant manual labor. Someone familiar with Charlemange's backend informed me that it works by requesting every instanceId individually, which would provide the basis of what I need with some appropriate filtering. However, my current estimate for the total number of PGCRs in the game is in the ballpark of 11.2 billion, which at 20 requests a second would take ~17.8 years to catch up, which is a little long for me to hope I can have a computer turned on and connected to the internet for. Obviously there has to be a more efficient method for getting PGCRs, otherwise Charlemange would not be able to have the data it does, and I believe both ends of the API here would prefer to avoid an excessive/abusive number of requests, especially considering Charlemange estimates there being only 53.3 million trials matches.

Java-12-Aero avatar Jul 11 '22 07:07 Java-12-Aero

Charlemange has probably been collecting the PGCRs in realtime since launch. At this point, the only realistic way to catch up would be to get the past 4 years of data from someone who has it, or start collecting trials data from now, and dont worry about historical matches (just future ones).

If you just want the previous weekend, then you could get a pcgr number at Tuesday reset, and then just go backwards until you get to Friday reset (just increment the number down, which I think will work). Just discard any pgcrs that are not Trials.

The Destiny API discord is a good place to discuss stuff like this: https://discord.gg/tNpvva8kRE

mikechambers avatar Aug 02 '22 21:08 mikechambers

Charlemange has probably been collecting the PGCRs in realtime since launch. At this point, the only realistic way to catch up would be to get the past 4 years of data from someone who has it, or start collecting trials data from now, and dont worry about historical matches (just future ones).

I'm actually on that discord, lol. I've spoken some with some people from Charlemagne and that's about what they've said, though they do know of somebody else who's doing a similar project to mine in terms of getting PGCRs. I also went and borrowed the Trials week info from Vlakafakata (of Trials.report) and am employing a binary search method to get the start/end PGCR ID numbers from each weekend to then iterate over and grab the ID numbers of just trials matches. I don't have the space to store the entire PGCR for all estimated 100 million or so matches, and I'm not really needing to keep all of the info long-term, so when I go back to finally do my data analysis, I'll be able to grab just the ones I need (estimated ~25 gb of storage needed for just IDs in a worst-case scenario). I've also done some work ahead of time and cut out 5 billion PGCRs from before trials release, and I'm estimating another 2-3 billion cut down by my search method, which should get my run time down to a more manageable level.

Java-12-Aero avatar Aug 02 '22 22:08 Java-12-Aero