caddy icon indicating copy to clipboard operation
caddy copied to clipboard

Natural sorting option for file_server file `browse` file listings

Open ChandlerSwift opened this issue 3 months ago • 6 comments

Issue Details

For files with regions of numbers and other characters, it's intuitive to have the numbers sorted by increasing numeric value rather than by ASCII code. (Jeff Atwood has a nice article here: https://blog.codinghorror.com/sorting-for-humans-natural-sort-order/)

Current behavior:

[chandler@oscar:~/projects/caddy/cmd/caddy]$ ls test
foo1  foo10  foo100  foo101  foo199  foo2  foo200  foo201  foo3  foo99
[chandler@oscar:~/projects/caddy/cmd/caddy]$ go run . file-server --root test --browse --listen :8000
Image

Desired behavior:

[chandler@oscar:~/projects/caddy/cmd/caddy]$ ls -v test
foo1  foo2  foo3  foo10  foo99  foo100  foo101  foo199  foo200  foo201
[chandler@oscar:~/projects/caddy/cmd/caddy]$ go run . file-server --root test --browse --listen :8000
Image

With the idea that code can be more precise than description, here's a likely-not-production-ready patch that accomplishes natural sorting. I'd be happy to PR in any variant of this if that's desirable.

I stole the code without modification from https://github.com/fvbommel/sortorder/blob/main/natsort.go (MIT licensed), as a quick skim suggested it would do what we wanted and be more performant than other implementations (e.g. https://github.com/facette/natsort/blob/master/natsort.go, https://github.com/skarademir/naturalsort/blob/master/naturalsort.go) that used regular expressions for chunk splitting. https://github.com/maruel/natural/blob/main/natsort.go may also be comparable in performance though I haven't reviewed it for correctness, and I haven't done any performance testing on any of these options, though a benchmark claims the specific one I included may be within a factor of two on performance vs strings.Sort.

Potential patch:
diff --git a/modules/caddyhttp/fileserver/browsetplcontext.go b/modules/caddyhttp/fileserver/browsetplcontext.go
index b9489c6a..339eea21 100644
--- a/modules/caddyhttp/fileserver/browsetplcontext.go
+++ b/modules/caddyhttp/fileserver/browsetplcontext.go
@@ -325,11 +325,72 @@ type (
 	byTime         browseTemplateContext
 )
 
+func isDigit(b byte) bool { return '0' <= b && b <= '9' }
+
+// naturalLess compares two strings using natural ordering. This means that e.g.
+// "abc2" < "abc12".
+//
+// Non-digit sequences and numbers are compared separately. The former are
+// compared bytewise, while digits are compared numerically (except that
+// the number of leading zeros is used as a tie-breaker, so e.g. "2" < "02")
+//
+// Limitation: only ASCII digits (0-9) are considered.
+func naturalLess(str1, str2 string) bool {
+	idx1, idx2 := 0, 0
+	for idx1 < len(str1) && idx2 < len(str2) {
+		c1, c2 := str1[idx1], str2[idx2]
+		dig1, dig2 := isDigit(c1), isDigit(c2)
+		switch {
+		case dig1 != dig2: // Digits before other characters.
+			return dig1 // True if LHS is a digit, false if the RHS is one.
+		case !dig1: // && !dig2, because dig1 == dig2
+			// UTF-8 compares bytewise-lexicographically, no need to decode
+			// codepoints.
+			if c1 != c2 {
+				return c1 < c2
+			}
+			idx1++
+			idx2++
+		default: // Digits
+			// Eat zeros.
+			for ; idx1 < len(str1) && str1[idx1] == '0'; idx1++ {
+			}
+			for ; idx2 < len(str2) && str2[idx2] == '0'; idx2++ {
+			}
+			// Eat all digits.
+			nonZero1, nonZero2 := idx1, idx2
+			for ; idx1 < len(str1) && isDigit(str1[idx1]); idx1++ {
+			}
+			for ; idx2 < len(str2) && isDigit(str2[idx2]); idx2++ {
+			}
+			// If lengths of numbers with non-zero prefix differ, the shorter
+			// one is less.
+			if len1, len2 := idx1-nonZero1, idx2-nonZero2; len1 != len2 {
+				return len1 < len2
+			}
+			// If they're equally long, string comparison is correct.
+			if nr1, nr2 := str1[nonZero1:idx1], str2[nonZero2:idx2]; nr1 != nr2 {
+				return nr1 < nr2
+			}
+			// Otherwise, the one with less zeros is less.
+			// Because everything up to the number is equal, comparing the index
+			// after the zeros is sufficient.
+			if nonZero1 != nonZero2 {
+				return nonZero1 < nonZero2
+			}
+		}
+		// They're identical so far, so continue comparing.
+	}
+	// So far they are identical. At least one is ended. If the other continues,
+	// it sorts last.
+	return len(str1) < len(str2)
+}
+
 func (l byName) Len() int      { return len(l.Items) }
 func (l byName) Swap(i, j int) { l.Items[i], l.Items[j] = l.Items[j], l.Items[i] }
 
 func (l byName) Less(i, j int) bool {
-	return strings.ToLower(l.Items[i].Name) < strings.ToLower(l.Items[j].Name)
+	return naturalLess(strings.ToLower(l.Items[i].Name), strings.ToLower(l.Items[j].Name))
 }
 
 func (l byNameDirFirst) Len() int      { return len(l.Items) }
@@ -338,7 +399,7 @@ func (l byNameDirFirst) Swap(i, j int) { l.Items[i], l.Items[j] = l.Items[j], l.
 func (l byNameDirFirst) Less(i, j int) bool {
 	// sort by name if both are dir or file
 	if l.Items[i].IsDir == l.Items[j].IsDir {
-		return strings.ToLower(l.Items[i].Name) < strings.ToLower(l.Items[j].Name)
+		return naturalLess(strings.ToLower(l.Items[i].Name), strings.ToLower(l.Items[j].Name))
 	}
 	// sort dir ahead of file
 	return l.Items[i].IsDir

Assistance Disclosure

AI not used

If AI was used, describe the extent to which it was used.

No response

ChandlerSwift avatar Aug 31 '25 03:08 ChandlerSwift

Can I work on this??

eshentials avatar Aug 31 '25 16:08 eshentials

It looks like @ChandlerSwift already made an implementation?

mholt avatar Aug 31 '25 17:08 mholt

"made an implementation" may be a bit strong—I will say that I have a hack that appears to do what I need! 🙂 This is my first time making any changes to Caddy, so I figured I'd solicit some feedback before I'd consider any of this mergeable.

A few specific concerns I have:

  • Performance: What kind of testing should I be doing to make sure that this doesn't offer any (substantial) performance regressions? Assuming this does add some runtime cost, how much is tolerable? (may depend on the next point)
  • Configurability: Should this behavior be the default, or should it be an option to be enabled? If so, should the administrator configure it (say, in the Caddyfile), or should this be exposed to the person viewing the HTML output, as e.g. sort order is?
  • Licensing: The snippet I pulled in is MIT licensed; I think that means that if I include the requisite notice we're safe (since sublicensing is allowed, specifically), but I do want to confirm before pulling in additional code.
  • Code organization: I haven't read enough of Caddy's code to get a feel for how code should be organized; should I break this off into a separate file or something?
  • Testing: Should I be adding any tests along with these changes?

If this is something that looks like it might be accepted into Caddy, I'd also be happy to open up a PR if that will make discussion easier! I just figured I'd lead with an issue, per CONTRIBUTING.md: https://github.com/caddyserver/caddy/blob/806fef85bedfc3fe178560afe6212c8ae90d3ebe/.github/CONTRIBUTING.md?plain=1#L32

ChandlerSwift avatar Aug 31 '25 19:08 ChandlerSwift

Performance: What kind of testing should I be doing to make sure that this doesn't offer any (substantial) performance regressions? Assuming this does add some runtime cost, how much is tolerable? (may depend on the next point)

I actually use https://pkg.go.dev/github.com/maruel/natural, which you mentioned above, in another project already, and so far it seems to be correct and efficient.

Configurability: Should this behavior be the default, or should it be an option to be enabled? If so, should the administrator configure it (say, in the Caddyfile), or should this be exposed to the person viewing the HTML output, as e.g. sort order is?

I think it's okay if it's the default. We can add configurability if it's requested. But I don't know if anyone cares at this point.

Maybe only the HTML output should be natural sorted though.

Licensing: The snippet I pulled in is MIT licensed; I think that means that if I include the requisite notice we're safe (since sublicensing is allowed, specifically), but I do want to confirm before pulling in additional code.

Yes I think so; so is the maruel library.

Code organization: I haven't read enough of Caddy's code to get a feel for how code should be organized; should I break this off into a separate file or something?

Wouldn't hurt, yeah.

Testing: Should I be adding any tests along with these changes?

We always approve of writing tests! But, you don't need to test the library code.

Please feel free to submit a PR at this point and we'll review it!

mholt avatar Sep 02 '25 20:09 mholt

I've tested the implementation for my own web/fileserver.

With normal version number style filenames, things look good, but when numbers are real decimal numbers, it doesn't quite work. I think it is "how it should be" in that the algorithm assumes more digits means larger numbers. But in the context of a real decimal number it is not correct.

Version number sort means:

  • 1.0.1
  • 1.0.2
  • 1.0.11
  • 1.1.0
  • 1.2.0

Decimal number sort should be:

  • 0.100
  • 0.10001
  • 0.1001
  • 0.101
  • 0.123

Perhaps there ought to be a Caddyfile option for specifying what type of sort that should be applied for specific directory trees?

Good example:

https://mirrors.tnonline.net/btrfs/btrfs-progs/sources/kernel.org/ Screenshot_20251101_194358_Opera beta

Bad example:

https://mirrors.tnonline.net/misc/img-compare/001_Juvenile_leopard_in_the_Serengeti_National_Park_Photo_by_Giles_Laurent/640x427/ Screenshot_20251101_194258_Opera beta

Forza-tng avatar Nov 01 '25 18:11 Forza-tng

Oof... yeah, this is one thing that I don't like about natural sort, and which is why I have hesitated to merge the associated PR... the algorithm is ambiguous, like it's not clear how to sort properly in certain situations.

mholt avatar Nov 01 '25 20:11 mholt