Needl icon indicating copy to clipboard operation
Needl copied to clipboard

brainstorming limitations and features

Open whilei opened this issue 7 years ago • 5 comments

which may or may not be existing/need refining/in the works...

  • randomized (but not TOO randomized) intervals... as below, general pattern mimicry would be ideal; exactly randomly between 1-10 seconds is not; humans are not just gravel, also rocks and boulders.
  • customizeable word lists
    • as crazy as it sounds, a chrome plugin to record actual searches and thereby use real starting data for mimicry might be effective (again, obfuscation vs privation)
  • variety of request types, ie POST, PATCH, DELETE... more tricky, but filtering vs GETS would be the first thing I'd do looking for real human logs
  • controlled variety of 'quest' depth. google+1click and then google something completely unrelated+1click is not convincing.

eg, my computer visiting 1000 random websites per day at 5 pages per minute is not going to be anywhere near convincing, given i visit a handful of sites in bursts normally (with that pattern already having been logged)


abstracted:

  • usage patterns that are not static randomness, but sporadic and clumpy, reasonably nonlinear
  • mimicry of actual/personalizeable trends in content

really abstracted:

  • better to make a handful of knitting needles than a busload of thumbtacks

I've said enough. Please close issue and destroy Github after reading. :beer:

whilei avatar Jun 19 '17 03:06 whilei

What about getting actual browsing data from volunteers to analyze multiple behaviours and decide what is the best option?

XayOn avatar Jun 26 '17 00:06 XayOn

@XayOn I think that would be the best idea I will look into my browsing history today to see what I can see.

NeuroWinter avatar Jul 02 '17 22:07 NeuroWinter

To add to this, some of the modules have a possibility of generating traffic that could be harmful if not outright incriminating. Without some kind of "safe mode", users could be putting themselves at real risk.

The project could take advantage of services like Google "Safe Search" or MyWOT, but this would probably make real traffic easier to spot at the same time.

t-mullen avatar Jul 05 '17 04:07 t-mullen

@XayOn @NeuroWinter sounds great! Nirsoft have a free tool at http://www.nirsoft.net/utils/browsing_history_view.html to extract history - if you anonymise and share then we can start parsing them, finding patterns, etc. For Chrome this could be pretty helpful: https://chrome.google.com/webstore/detail/web-historian-web-history/chpcblajbmmlbhecpnnadmjmlbhkloji

@RationalCoding yes I agree. Would you mind creating an issue and taking ownership? Creating a list of English profanity words and cross-referencing with chosen words should do the trick for the majority of cases. Not sure how to tackle Alexas top 1M as it contains a lot of porn sites.

eth0izzle avatar Jul 06 '17 10:07 eth0izzle

I have just got back from holiday and I am willing to work on this a bit now.

What sort of information are we looking for from a history dump?

NeuroWinter avatar Jul 30 '17 20:07 NeuroWinter