proofofconcept
proofofconcept copied to clipboard
search server logs for patterns of behavior. What pages do visitors go to?
What can be learned from the logs available on the server?
~/proofofconcept/v7_pickle_web_interface/flask/logs$ ls -Shal
-rw-r--r-- 1 pdg pdg 343M Feb 18 17:54 gunicorn_error.log
-rw-r--r-- 1 pdg pdg 203M Feb 18 17:54 gunicorn_access.log
-rw-r--r-- 1 pdg pdg 202M Feb 18 17:54 nginx_access.log
-rw-r--r-- 1 pdg pdg 33M Feb 18 17:54 nginx_error.log
the format of ~/proofofconcept/v7_pickle_web_interface/flask/logs/nginx_access.log
is set in https://github.com/allofphysicsgraph/proofofconcept/blob/gh-pages/v7_pickle_web_interface/services/nginx/nginx.conf#L20
the format of ~/proofofconcept/v7_pickle_web_interface/flask/logs/gunicorn_access.log
is set in https://github.com/allofphysicsgraph/proofofconcept/blob/gh-pages/v7_pickle_web_interface/gunicorn.config.py#L38
There is https://github.com/allofphysicsgraph/proofofconcept/blob/gh-pages/v7_pickle_web_interface/flask/templates/monitoring.html but the corresponding page doesn't seem to work.
https://physicsderivationgraph.blogspot.com/2020/11/log-analysis-of-nginx-access-using.html
See https://bastian.rieck.me/blog/posts/2022/server/ and https://news.ycombinator.com/item?id=30661852 for tips on securing a web server
https://nishtahir.com/i-looked-through-attacks-in-my-access-logs-heres-what-i-found/ https://news.ycombinator.com/item?id=39165711
filtered out bots and crawlers using the User Agent string
with open('logs_as_of_2024-09-02/nginx_access.log') as file_handle:
file_content = file_handle.readlines()
list_of_dicts = []
for this_line in file_content:
list_of_dicts.append(eval(this_line.strip()))
df = pandas.DataFrame(list_of_dicts)
df['time'] = pandas.to_datetime(df['time'], format='%d/%b/%Y:%H:%M:%S %z')
df_no_bots = df[~df['ua'].str.contains("bot|crawler", case=False)]
for ip_value, count in df_no_bots['ip'].value_counts().head(50).items():
print("IP:",ip_value, "has made", count, "requests; first observed on", df[df['ip']==ip_value]['time'].values[0],"with User Agent")
print(df[df['ip']==ip_value]['ua'].value_counts().head(10))
number_of_bins = int(max((df[df['ip']==ip_value]['time'].values - df[df['ip']==ip_value]['time'].values[0]) / pandas.Timedelta(days=1)))
if number_of_bins==0:
number_of_bins=1
plt.figure()
plt.hist((df[df['ip']==ip_value]['time'].values - df[df['ip']==ip_value]['time'].values[0]) / pandas.Timedelta(days=1),bins=number_of_bins)
plt.xlabel('days since first obervation')
plt.ylabel('number of requests per day');
plt.title(str(ip_value)+"; "+str(count));
plt.show()