Improper address usage by multiple instantiated objects
In the frontingEngine.py file in the frontingEngine library, we use the detectCDN library to instantiate a list of objects which are of type domain.
Issue
When instantiating a series of Domain objects with letting the Domain class set default values for all local variables for the object as defined in detectCDN.cdn_lookup.py, it is observed that all Domain objects in the pot list for DomainPot all share the same address for their local variables Domain.cdns, Domain.whois_data, etc except for Domain.url which is the only parameter passed when instantiating the object.
Replication
The original state of the of the DomainPot class code is the following:
class DomainPot:
def __init__(self, domains: List[str]):
self.domains = []
self.domain_to_cdn = {}
# Convert to list of type domain
for dom in domains:
domin = detectCDN.domain(dom)
self.domains.append(domin)
With the class in frontingEngine.py changed back to the above, from the src/ directory, open up a python interpreter or run a script with the following commands:
import frontingEngine
domains = ["login.gov","censys.io","asu.edu","google.com"]
pot = frontingEngine.DomainPot(domains)
# Print addresses
for domain in pot.domains:
print("[%s]\tCDN list located at\t%s" % (domain.url,hex(id(domain.cdns))))
If the issue still exists, all addresses for the CDNs list will be the same
Current work-around
Instead of depending on the detectCDN class to define defaults, I am passing default list() objects as parameters to the Domain object. The code in frontingEngine.py looks like the following:
class DomainPot:
def __init__(self, domains: List[str]):
self.domains = []
self.domain_to_cdn = {}
# Convert to list of type domain
for dom in domains:
domin = detectCDN.domain(
dom, list(), list(), list(), list(), list(), list(), list(), list()
)
self.domains.append(domin)
This of course is not ideal however, it stops the sharing of addresses between objects.
Current thoughts
I feel like this could be an issue with the namespace for which the class is being called. The library detectCDN may have improper namespace definitions which could cause any variables such as default variables which the library defines, may cause subsequent objects which are instantiated from the same class which also have default variables defined by the library, to share the same address for their local variables.
The reason passing them from the calling python script/library, such as in how frontingEngine.py calls upon detectCDN.py, works is because the passed parameter is defined in the namespace of the calling library and the address is then passed to the called library by address (pass by address, not pass by value) therefore it is always different than other objects/parameters/values which are defined in the calling library because it understands how they should exist. Letting the library it calls handle its own internal variables and in its namespace may cause the sharing of addresses between objects.
I've run into this in the past, unfortunately the '[]' interacts ~~with the param list in a strange way~~ at the class level, since it is not within the init function body. I think the weirdness may be the interaction with the typing module and the constructor, since I'm not sure this behavior happens without the typing hints.
However, there is an easy fix for this!
In the constructor, you can use argname or [] and this will always get you a fresh array, or the passed in argument.
I've setup a repl.it @ https://replit.com/@Salt5/ExpertHightechComment#main.py that works around this using the Domain's cname parameter.
The code (incase the repll.it goes away...) is
from typing import List
class Domain:
"""Domain class allows for storage of metadata on domain."""
def __init__(
self,
url: str,
ip: List[str] = [],
cnames: List[str] = None,
cdns: List[str] = [],
cdns_by_name: List[str] = [],
namsrvs: List[str] = [],
headers: List[str] = [],
whois_data: List[str] = [],
):
"""Initialize object to store metadata on domain in url."""
self.url = url
self.ip = ip
self.cnames = cnames or []
self.cdns = cdns
self.cdns_by_name = cdns_by_name
self.namesrvs = namsrvs
self.headers = headers
self.whois_data = whois_data
self.cdn_present = False
domain1 = Domain("foo.com")
domain2 = Domain("bar.com")
domain1.cnames.append("This is a test.")
domain3 = Domain("bar.com",[],["Does this work?"])
print(f"1: {domain1.url} {domain1.cnames}")
print(f"2: {domain2.url} {domain2.cnames}")
print(f"3: {domain3.url} {domain3.cnames}")
The output I receive is:
1: foo.com ['This is a test.'] 2: bar.com [] 3: bar.com ['Does this work?']
I'd like to setup a pull request + tests for this specifically, unless I'm totally off base here.
In the example which mirrors your Domain class code, reverting the cname initializer causes the first two output domains to have the same array and contents for cnames.
Edit...
I think the best way to handle it is that in the arg list you pass "None" and then in init use an or statement to replace None with the new array, since the '[]' is confusing, and does not work.
Wanted to add that I believe that this behavior happens because the = [] happens outside of the init function and indeed is evaluated at the class level.
Before I go any further on this, I'd like to make sure that someone over at this project is still maintaining this and is looking for fixes!
This is an awesome fix. I left a comment on the PR. Unfortunately I am unable to merge.