FEATURE: user_agent: random
Description
As I'm working my way through the new PyFunceble.yaml I stumbled on the fact we are hard coding the use of browser and OS, Somehow this just seems wrong, as we know, then the big5 and their wannabees, are logging the IP/Browser/OS etc, and sharing these data, to fetch bots and gives us FP-responses on http codes, as we receives false 4xx, 5xx HTTPS codes.
Possible Solution
One of two
- Randomize browser and OS per -n records (Optimal)
- Randomize browser and OS per test starts
Considered Alternative
Manual labor, bad, the system is MY slave, not the other way around.
Additional context
UPDATE: typos
Yeah, but what does random means for you?
Our user_agents are getting rotated every week - when not every day... And we choose a random one from one of the latest / most used ones:
https://github.com/funilrys/PyFunceble/blob/fd2ce92208336300df07b00ce3995956b81ad62e/PyFunceble/dataset/user_agent.py#L225-L226
And we choose a random one from one of the latest / most used ones:
I see tree questions here
- Selection among browser + OS
- How to control
- You(me) ask because
- Automatically rotate among all the available ones from the list (forgot the url), but you probably remember it. to the list of options
- by a switch of
Randomvs chrome,Linux in thepyf.owerwrite.yml - Because as of now, 1 I didn't know it was rotated by default, 2. in
pyf.owerwrite.yml. you can only set a firm value
Come to think of... and YES it is a though only, can you spoof the mac addresses used for querying?
IF == True
Would it be an idea then to periodically with hourly randomness generate a set of "Clients, running its own MAC + Browser + OS"? ... IF == TRUE Could this help making it looks like a random number of users on the same network vs one client from the same network?
Well, look here under the @modern datasets. What you give PyFunceble is the keys to a list of user-agents.
I'll have to think of a way to implement randomness. That won't be the default though.
Let's clarify ...
- The reference file is rotated every few days: https://github.com/PyFunceble/user_agents/blob/master/user_agents.json
- Every PyFunceble instance in the world tries to fetch that file; if the local version is older than 1 hour.
In the end-choice, user have to choices:
- Set their own user agent through the
user_agent.customkey:
https://github.com/funilrys/PyFunceble/blob/296a613f2e4149f8e790cd92ef4b964694edcac6/PyFunceble/data/infrastructure/.PyFunceble_production.yaml#L630
- Set a preferred
user_agent.platformanduser_agent.browser:
https://github.com/funilrys/PyFunceble/blob/296a613f2e4149f8e790cd92ef4b964694edcac6/PyFunceble/data/infrastructure/.PyFunceble_production.yaml#L610-L621
When end-user choose the later, the engine takes the given user_agent.platform as a subkey of the @modern key and the user_agent.browser as a subkey of the resulting dataset of the previous query. The result will be a list of user-agents. From that list, the engine will always select a random one.
https://github.com/funilrys/PyFunceble/blob/296a613f2e4149f8e790cd92ef4b964694edcac6/PyFunceble/dataset/user_agent.py#L255
That's how it works - right now.
So, you are suggesting, that we should an extra random layer / option ?
Let me see if I understood this, for dummies.
Even I set a browser.agentX and platform.OXY, then the engine are mixing any of the @morern browsers, versions, etc
Then what about the OS.platform and screenSize?
We don't simulate screens. We are not selenium or a browser (- yet ??).
The os platform is always linux. Unless specified differently.
cf: https://github.com/funilrys/PyFunceble/blob/a05f2af03a0d951e6aac998e7dcb7416abfa8294/PyFunceble/data/infrastructure/.PyFunceble_production.yaml#L621