liquid icon indicating copy to clipboard operation
liquid copied to clipboard

Add sort_numeric filter

Open pushrax opened this issue 6 years ago • 13 comments

Closes #980

pushrax avatar Sep 13 '18 21:09 pushrax

Sorry for just popping in but why not introducing a more generic sort_by which accepts a block instead of another very specify sort method? Similar to what is in Enumerable? :thinking:

ChrisBr avatar Sep 21 '18 21:09 ChrisBr

Accepting a block is not yet a concept in Liquid. I'm not going to merge this yet though, because it's possible that could change.

pushrax avatar Sep 24 '18 19:09 pushrax

What happened to Travis CI? It doesn't seem to be running anymore

dylanahsmith avatar Sep 25 '18 15:09 dylanahsmith

For some reason Travis CI was disabled for this repo. I enabled it again, new commits should be built.

pushrax avatar Sep 25 '18 18:09 pushrax

@pushrax I've a couple of questions regarding the last branch of the conditional block in the current state:

elsif ary.first.respond_to?(:[]) && !ary.first[property].nil?
  ary.sort do |a, b|
    Utils.to_number(a[property]) <=> Utils.to_number(b[property])
  end
end
  • Why check only if ary.first.respond_to?(:[])? What if b[property] in the sort block fails with a NoMethodError?
  • Why check for !ary.first[property].nil? when Utils.to_number(nil) returns 0 anyways..

Refactoring to..

def sort_numeric(input, property = nil)
  ary = InputIterator.new(input)
  if property.nil?
    ary.sort { |a, b| Utils.to_number(a) <=> Utils.to_number(b) }
  elsif ary.empty?
    []
  else
    ary.sort do |a, b|
      next unless a.respond_to?(:[]) && b.respond_to?(:[])
      Utils.to_number(a[property]) <=> Utils.to_number(b[property])
    end
  end
end

ashmaroli avatar Feb 23 '19 09:02 ashmaroli

Checking just the first element for [] is a heuristic used by other filters already, and in Shopify seems to work fine while being much faster than checking in the loop. You make a totally fair point, perhaps we should reconsider.

The nil check is also reasonably questionable, also copied from sort. I think the idea is to have a sort of respond_to?(property) check, but it could also just happen to be nil. I wouldn’t be surprised if this check is causing a rare bug in a Liquid template.

pushrax avatar Feb 23 '19 13:02 pushrax

seems to work fine while being much faster than checking in the loop.

@pushrax I wrote a benchmark script to test this..:

# frozen_string_literal: true

require 'benchmark/ips'
require 'liquid'

def sort_numeric_original(input, property = nil)
  ary = Liquid::StandardFilters::InputIterator.new(input)
  if property.nil?
    ary.sort do |a, b|
      Liquid::Utils.to_number(a) <=> Liquid::Utils.to_number(b)
    end
  elsif ary.empty?
    []
  elsif ary.first.respond_to?(:[]) && !ary.first[property].nil?
    ary.sort do |a, b|
      Liquid::Utils.to_number(a[property]) <=> Liquid::Utils.to_number(b[property])
    end
  end
end

def sort_numeric_proposed(input, property = nil)
  ary = Liquid::StandardFilters::InputIterator.new(input)
  if property.nil?
    ary.sort { |a, b| Liquid::Utils.to_number(a) <=> Liquid::Utils.to_number(b) }
  elsif ary.empty?
    []
  else
    ary.sort do |a, b|
      next unless a.respond_to?(:[]) && b.respond_to?(:[])
      Liquid::Utils.to_number(a[property]) <=> Liquid::Utils.to_number(b[property])
    end
  end
end

INPUT = [{ "a" => '10' }, { "a" => '3' }, { "a" => '1' }, { "a" => '2' }]
return unless sort_numeric_original(INPUT) == sort_numeric_proposed(INPUT)
return unless sort_numeric_original(INPUT, "a") == sort_numeric_proposed(INPUT, "a")

Benchmark.ips do |x|
  x.report('original sort_numeric no-property') { sort_numeric_original(INPUT) }
  x.report('proposed sort_numeric no-property') { sort_numeric_proposed(INPUT) }
  x.compare!
end

Benchmark.ips do |x|
  x.report('original sort_numeric') { sort_numeric_original(INPUT, "a") }
  x.report('proposed sort_numeric') { sort_numeric_proposed(INPUT, "a") }
  x.compare!
end

On my system, the numbers are:

Warming up --------------------------------------
original sort_numeric no-property
                         6.001k i/100ms
proposed sort_numeric no-property
                         6.048k i/100ms
Calculating -------------------------------------
original sort_numeric no-property
                         67.734k (A± 1.2%) i/s -    342.057k in   5.050704s
proposed sort_numeric no-property
                         67.734k (A± 0.4%) i/s -    338.688k in   5.000315s

Comparison:
proposed sort_numeric no-property:    67734.4 i/s
original sort_numeric no-property:    67734.2 i/s - same-ish: difference falls within error

Warming up --------------------------------------
original sort_numeric
                         2.093k i/100ms
proposed sort_numeric
                         2.925k i/100ms
Calculating -------------------------------------
original sort_numeric
                         21.918k (A± 4.1%) i/s -    110.929k in   5.071298s
proposed sort_numeric
                         31.508k (A± 0.5%) i/s -    157.950k in   5.013203s

Comparison:
proposed sort_numeric:    31507.5 i/s
original sort_numeric:    21917.8 i/s - 1.44x  slower

ashmaroli avatar Feb 23 '19 14:02 ashmaroli

The constant overhead is dominating in that benchmark with just 4 elements. With an order of magnitude or two more elements the result should be different. Removing !ary.first[property].nil? is the big change to constant overhead. to_number is likely to be the big factor in the block anyway (looks like it could use optimization, the allocations from obj.strip =~ /\A-?\d+\.\d+\z/ are unnecessary).

In any case, as mentioned your points were fair and the perf loss of checking all the elements for [] is probably worth it. However, the implicit contract of sort filters so far is to return nil when sorting by a property and the input elements don't respond to [], so I guess we need something more like ary.all? { |x| x.respond_to?(:[]) } instead of ary.first.respond_to?(:[]) or change the contract.

I'm was going to abandon this PR since I've been thinking about a more general change to let Liquid developers pass something like a block to filters, but that idea isn't fleshed out enough to happen soon and this filter is indeed useful and generic enough to be included. I'll probably circle back to it this week some time, thanks for the ping.

pushrax avatar Feb 23 '19 18:02 pushrax

Hey @pushrax, was browsing around for resources when I found the GH Issue and then this PR. Totally understand if you're busy and aren't planning on workin on this PR, but do you know of any workarounds or alternative implementations?

Use case is that we're sorting a bunch of Docker image tags using Jekyll + Liquid, and we're pulling those values from a JSON file. So essentially we're sorting through a list of values that looks like this: ["10.12.0", "somewords", "hahaha", "3.5.21"]. We're running into the lexicographical issue where all of the 9.x.x and 8.x.x are showing up as "later" than 12.x.x and 11.x.x versions in the list when sorted. Pointers would be appreciated if/when you have a second!

mvxt avatar Jun 05 '19 22:06 mvxt

Hi @pushrax!

Are we going to see this in Liquid soon? It will be almost a year since this PR was opened, and people truly need this feature.

Thank you, and all the best!

lubeskih avatar Aug 28 '19 20:08 lubeskih

This could likely be optimized after the fact, but for now I think this should just make it through, it seems to be some valid use cases held off by this.

shopmike avatar Oct 04 '19 05:10 shopmike

What about this PR?

plewandowski avatar Jan 20 '20 11:01 plewandowski

bump? :3

pudiva avatar Jan 04 '23 01:01 pudiva