ablog
ablog copied to clipboard
Indetermistic output for blog pages related to tags
Describe the bug
I am trying to revert to an approach where I compare the output created by sphinx for changes I do not want. I have had some disasters strike in terms on unnoticed bad changes already a couple of times. I used to diff outputs when I was using nikola, but I gave up, because of many issues.
With sphinx my current config builds stable, except for ablog. And I have a lot of changes related to tags. This is a diff I am getting when I rebuild from scratch 2 times in a row. It never builds the same. I did not try to disable Python hash randomization, it might be a bandaid, but of course only so much
diff -ru output.1/blog/2010.html output/blog/2010.html
--- output.1/blog/2010.html 2024-01-27 18:12:49.779910129 +0100
+++ output/blog/2010.html 2024-01-27 18:14:41.720750591 +0100
@@ -257,13 +257,13 @@
- <a href="tag/python.html">Python</a>
+ <a href="tag/git.html">git</a>
- <a href="tag/git.html">git</a>
+ <a href="tag/python.html">Python</a>
I get many of these changes, the blog pages and its RSS feed are of course of very large importance to my site.
I am willing to hunt this down on my own. I am using sphinx-build 7.2.6 and ablog==0.11.6 believing these to be fairly recent.
There is a devcontainer that automatically builds on install for my web site: https://github.com/Nuitka/Nuitka-website
I am suspecting, that a set
object is being used. Since I am on Python3.10 there, dictionaries are no longer unordered really, but this could also be unordered usage of a file system result, I couldn't tell yet.
I am using pipenv to install. I am sure I have seen it on 3.9 in my WSL too, during a migration from Debian 3.9 WSL2 pipenv config of old to new 3.10 based one for use in devcontainers.
I will be looking at your templates and what data is used to produce the archive (and I think other ablog pages are affected too), to see how unsorted it is.
To Reproduce
No response
Screenshots
No response
System Details
==============================
sunpy Installation Information
==============================
General
#######
OS: Ubuntu (22.04, Linux 5.15.90.1-microsoft-standard-WSL2)
Arch: 64bit, (x86_64)
sunpy: 5.1.1
Installation path: /home/vscode/.local/share/virtualenvs/Nuitka-website-rRTh7jbj/lib/python3.10/site-packages/sunpy-5.1.1.dist-info
Required Dependencies
#####################
astropy: 6.0.0
numpy: 1.26.3
packaging: 23.2
parfive: 2.0.2
Optional Dependencies
#####################
asdf: Missing asdf>=2.8.0; extra == "asdf" or "docs" or "tests"
asdf-astropy: Missing asdf-astropy>=0.1.1; extra == "asdf" or "docs" or "tests"
beautifulsoup4: Missing beautifulsoup4>=4.8.0; extra == "docs" or "net" or "tests"
cdflib: Missing cdflib!=0.4.0,!=1.0.0,>=0.3.20; extra == "docs" or "tests" or "timeseries"
dask: Missing dask[array]>=2021.4.0; extra == "dask" or "docs" or "tests"
drms: Missing drms<0.7.0,>=0.6.1; extra == "docs" or "net" or "tests"
glymur: Missing glymur!=0.9.5,>=0.9.1; extra == "docs" or "jpeg2000" or "tests"
h5netcdf: Missing h5netcdf>=0.11; extra == "docs" or "tests" or "timeseries"
h5py: Missing h5py>=3.1.0; extra == "docs" or "tests" or "timeseries"
lxml: 5.1.0
matplotlib: Missing matplotlib>=3.5.0; extra == "docs" or "map" or "tests" or "timeseries" or "visualization"
mpl-animators: Missing mpl-animators>=1.0.0; extra == "docs" or "map" or "tests" or "visualization"
pandas: Missing pandas>=1.2.0; extra == "docs" or "tests" or "timeseries"
python-dateutil: 2.8.2
reproject: Missing reproject; extra == "docs" or "docs-gallery" or "map" or "tests"
scikit-image: Missing scikit-image>=0.18.0; extra == "docs" or "image" or "tests"
scipy: Missing scipy!=1.10.0,>=1.7.0; extra == "docs" or "image" or "map" or "tests"
sqlalchemy: Missing sqlalchemy>=1.3.4; extra == "database" or "docs" or "tests"
tqdm: 4.66.1
zeep: Missing zeep>=3.4.0; extra == "docs" or "net" or "tests"
Installation method
pip
So, I found the culprit, post tags are indeed loosing their ordering, so the postcard2 template produces a different ordering of the tags for each rendering of the page. In my case, I have 3 tags specified, but 2 of them switch over easily in that it seems. The use of ordered-set would resolve that, but I can see how you would hate adding a dependency. I have a fallback in Nuitka, which does do a ordered set too, that I used to test this.
from collections.abc import MutableSet
class OrderedSet(MutableSet):
is_fallback = True
def __init__(self, iterable=()):
self.end = end = []
end += (None, end, end) # sentinel node for doubly linked list
self.map = {} # key --> [key, prev, next]
if iterable:
self |= iterable
def __len__(self):
return len(self.map)
def __contains__(self, key):
return key in self.map
def add(self, key):
if key not in self.map:
end = self.end
curr = end[1]
curr[2] = end[1] = self.map[key] = [key, curr, end]
def update(self, keys):
for key in keys:
self.add(key)
def discard(self, key):
if key in self.map:
key, prev, next = self.map.pop(key)
prev[2] = next
next[1] = prev
def __iter__(self):
end = self.end
curr = end[2]
while curr is not end:
yield curr[0]
curr = curr[2]
def __reversed__(self):
end = self.end
curr = end[1]
while curr is not end:
yield curr[0]
curr = curr[1]
def pop(self, last=True):
if not self:
raise KeyError("set is empty")
key = self.end[1][0] if last else self.end[2][0]
self.discard(key)
return key
def __repr__(self):
if not self:
return "%s()" % (self.__class__.__name__,)
return "%s(%r)" % (self.__class__.__name__, list(self))
def __eq__(self, other):
if isinstance(other, OrderedSet):
return len(self) == len(other) and list(self) == list(other)
return set(self) == set(other)
def union(self, iterable):
result = OrderedSet(self)
for key in iterable:
result.add(key)
return result
def index(self, key):
if key in self.map:
end = self.end
curr = self.map[key]
count = 0
while curr is not end:
curr = curr[1]
count += 1
return count - 1
return None
def _split(a):
return OrderedSet(s.strip() for s in (a or "").split(","))
Obviously with ordered-set
from PyPI, this becomes from ordered_set import OrderedSet
. Let me know if I should make a PR out of it. It would be nice if it was accepted at least as an optional dependency. For the fallback, I am not 100% sure it's really perfect for everything, in my Python compiler Nuitka, it's not causing issues, but you may not be wanting to take a risk.
I cannot tell other consequences of having post _split
producing ordered sets, for my blog, there are no measurable ones.
On my side, until this is released, I think I can monkey patch _split
to be the improved one.
Thanks for the report!
How would you want it ordered or would you just want the tags to not change order for each build?
We should be able to order the output (hopefully sort would be enough) before it's passed to the templates. That would avoid the need for adding orderedset?
I think it might be natural to expose the order of the tags provided by the user. That is what OrderedSet
gives me now. It only removes duplicates. That of course exposes, that for similar posting types, I didn't pick the same ordering, "nuitka,python,compiler" is accompanied by many uses of "compiler,nuitka,python", etc. with many permutations. I obviously didn't consider their ordering until now.
My complaint is mainly with the HTML output being different for each build and in a sense an uncontrollable ordering happening, anything removing that is an improvement. Asking the user how to sort the different attributes on a config level, might be too much effort, and totally not worth it. It seems natural to order in the page source.
In that case, maybe if I can add something like [a for a in list if a in set(list)]
in the code base in the right location, I can avoid adding an optional dependency.
I will look into this hopefully soon(TM).
I didn't dare change the type away from set
, but making things unique is of course doable like what you describe there far easier, if you only need iteration and in
tests, that's of course no issue.