proxpi
proxpi copied to clipboard
long-term use of proxpi on a low-resource server
dear EpicWink, i am considering using your caching proxy proxpi
on my field (semi-portable) NAS. the server is solely for my use, and it has a pretty weak CPU and 1.5GB of RAM.
i wonder if your caching proxy is suitable for long-term cache storage in case of internet disruptions. for instance, if i set PROXPI_INDEX_TTL=6570000
(several years) and accumulate cache over a few months (and after numerous power cycles), will the server operate normally? will i be able to use the cached files for up to a year?
i would appreciate any information or recommendations you could provide. best regards, ~le berouque
EDIT: added remarks about "semi-portable" and "power cycles"
well...
i cached some packages and rebooted my "field" NAS (with proxpi configured as a service).
Before the reboot:
- flask process = 430M of RAM
- proxpi-cache folder = 2.3G of HDD
After reboot:
- process flask = 20M of RAM
- proxpi-cache folder = 2.3G of HDD
as a part of the experiment, I am going to run the script "pypi_warming_up.ps1" on my win10 PC which installs the "top 8000 most popular python packages", according to hugovk. pip is configured to query the NAS first, and in the case of 120sec timeout -- pypi repo.
staying tuned!
There are two caches:
- The index cache, containing the list of all files for all projects[^1] (and the list of all projects)
- The files cache, containing all files downloaded via
proxpi
The index cache has a TTL, which invalidates the cache on next access (ie download attempt from a client like pip). This has no memory bound, so for a sufficiently large TTL it can cause MemoryError
. This cache is not saved to disk, so is wiped on server restart.
The files cache has a configurable max disk usage, and should use very little memory. It is also resilient to server restarts.
If you need a persistent cache that survives server restarts, proxpi
is not what you need; it's designed as an optimistic proxy foremost. You could check out some of the alternatives, especially devpi
[^1]: aka packages
the alternatives don't work for me because they are performance-oriented, and i need data availability — performance doesn't really bother me. when there is no internet access, slow index searches are not the biggest problem. on the other hand, NAS resources consumption does bother me.
please tell me, are there architectural obstacles in your application to saving the index on disk and loading it when needed? i want to know this before making changes to the code.
are there architectural obstacles in your application to saving the index on disk and loading it when needed?
I'm not sure. If you keep the API of proxpi._cache._IndexCache
the same, and replace the _index
and _packages
dict attributes with some file-based storage, it should work. You will of course need to change the eviction to not just be a time-to-live.
I won't merge any PR that makes this change (unless you create a subclass of _IndexCache
, and enable it via a configuration flag) as the in-memory cache is necessary for simplicity (and therefore reliability of the code) and performance. I'm happy to help and answer questions if you simply want make a fork to suit your requirements
Good afternoon. In the server.py file you wrote:
@app.route("/index/<package_name>/<file_name>")
def get_file(package_name: str, file_name: str):
...
if scheme and scheme != "file":
return flask. Redirect(path)
However, this does not allow you to use the app in Windows. I rewrote this part like this:
if scheme in ['http', 'https', 'ftp']:
return flask.redirect(path)
So it works now in Windows as well, but I am not sure if this could cause any problems?
if scheme and scheme != "file":
is intended to have all URLs be treated as redirect targets, and all paths to point to files to be served.
I think a better solution is to return a different type (eg pathlib.Path
) rather than requiring the server to always parse a string:
Diff (click to expand):
diff --git a/src/proxpi/_cache.py b/src/proxpi/_cache.py
index 3e09f51..85049c5 100644
--- a/src/proxpi/_cache.py
+++ b/src/proxpi/_cache.py
@@ -7,6 +7,7 @@ import abc
import time
import shutil
import logging
+import pathlib
import tempfile
import warnings
import functools
@@ -719,13 +720,13 @@ class _FileCache:
return True # default to original URL (due to timeout or HTTP error)
return False
- def _get_cached(self, url: str) -> t.Union[str, None]:
+ def _get_cached(self, url: str) -> t.Union[pathlib.Path, None]:
"""Get file from cache."""
if url in self._files:
file = self._files[url]
assert isinstance(file, _CachedFile)
file.n_hits += 1
- return file.path
+ return pathlib.Path(file.path)
return None
def _start_downloading(self, url: str):
@@ -751,7 +752,7 @@ class _FileCache:
os.unlink(file.path)
existing_size -= file.size
- def get(self, url: str) -> str:
+ def get(self, url: str) -> t.Union[str, pathlib.Path]:
"""Get a file using or updating cache.
Args:
@@ -884,7 +885,7 @@ class Cache:
raise exc
return files
- def get_file(self, package_name: str, file_name: str) -> str:
+ def get_file(self, package_name: str, file_name: str) -> t.Union[str, pathlib.Path]:
"""Get a file.
Args:
diff --git a/src/proxpi/server.py b/src/proxpi/server.py
index 1124eca..69c754b 100644
--- a/src/proxpi/server.py
+++ b/src/proxpi/server.py
@@ -4,8 +4,8 @@ import os
import gzip
import zlib
import logging
+import pathlib
import typing as t
-import urllib.parse
import flask
import jinja2
@@ -203,8 +203,7 @@ def get_file(package_name: str, file_name: str):
except _cache.NotFound:
flask.abort(404)
raise
- scheme = urllib.parse.urlparse(path).scheme
- if scheme and scheme != "file":
+ if not isinstance(path, pathlib.Path):
return flask.redirect(path)
return flask.send_file(path, mimetype=_file_mime_type)
See #48