Invalid char in username for some sites
I have checked issue #55 and issue #430 but it looks like the problem of handling invalid chars like "." in usernames is done only for subdomains but not in the url.
One example: https://wanelo.co/john is valid but https://wanelo.co/john.doe is in fact the john account, thus john.doe is a false positive. The json data file should have an allowed/forbidden char list (both could be usefull). I'd be glad to contribute and add the changes to wmn-data.json if this trivial change is approved.
Hmm. Thanks for pointing this out. Since it is only an issue on some entries and not others, it would probably need to be a per-entry parameter because globally escaping/replacing . from all usernames across all sites would most likely have false negative impacts.
Thoughts?
Yes I was thinking about a per site / entry option. Options to discuss:
- a good variable (key) name
- regex or just plain chars
For now the false positives I have had it would be enough to deal with a simple option like:
badchars: '.'
on a per site basis.
Example:
{
"name" : "Wanelo",
"uri_check" : "https://wanelo.co/{account}",
"badchars" : ".",
"post_body" : "",
"e_code" : 200,
"e_string" : "on Wanelo</title>",
"m_string" : "Hmm, that's embarrassing",
"m_code" : 404,
"known" : ["lisandrareyes"],
"cat" : "shopping",
"valid" : true
},
and code logic would be as simple as (python):
badchars = set(site["badchars"])
if any((c in account) for c in badchars): continue
So, the . is really the only character that I've noticed causes us problems since usernames can be in the subdomains or as a parameter. I'm wondering if the parameter could just be a Boolean strip_bad_char with values of True or False. If true, then remove anything non-[a-zA-Z0-9].
Thoughts?
Seems totally good!
OK @enodr. I'll take this on to insert the strip_bad_char boolean
@WebBreacher , we can make the field optional, such that if it is specified/provided the strip_bad_char method is called. That will save you the time of editing each entry in the wmn-data.json file.
Agreed @yooper and, at the same time, we can do the same for the "post_body" : "", which is rarely used.
I will put in sometime this week and make it happen.
Hey thanks! I can mod the JSON if you wanna add the feature to the script.
Going to close this as:
- We have the
strip_bad_charschema addition - The python code @yooper was going to mod has been removed from this project into its own project.