TypeScript icon indicating copy to clipboard operation
TypeScript copied to clipboard

Implement Tim sort.

Open sozelfist opened this issue 10 months ago • 1 comments

Tim sort is a sorting algorithm that combines the techniques of insertion sort and merge sort. It was designed to perform well on many kinds of real-world data. The algorithm works by dividing the input into small pieces, sorting them using insertion sort, and then merging them using merge sort. Tim sort has a time complexity of O(nlogn) in the worst-case scenario and is widely used in programming languages such as Python, Java, and C++.

sozelfist avatar Apr 15 '24 02:04 sozelfist

Can you review this PR? It's been sitting here for 3 weeks, @appgurueu?

sozelfist avatar Apr 23 '24 14:04 sozelfist

Hey, @appgurueu. Could you please have a look at this PR? It's been here for a month. I think this PR is good enough to be accepted and merged into the master branch.

sozelfist avatar Jun 09 '24 13:06 sozelfist

Can you review the updates @appgurueu, @raklaptudirm?

sozelfist avatar Jun 16 '24 08:06 sozelfist

I will break down the implementation in detail as follows:

  1. Constants:

    • MIN_MERGE: Minimum size of a run to be merged, typically 32.
    • MIN_GALLOP: A threshold for switching to galloping mode during merge, typically 7.
  2. Comparator Type:

    • Comparator<T>: Defines a comparator function type to compare elements of type T.
  3. Merge Function:

    • The merge function merges two sorted subarrays using optimized galloping mode.
    • Galloping mode quickly moves through elements when many consecutive elements from one subarray are smaller than those from the other subarray.
      const merge = <T>(
        arr: T[],
        leftIndex: number,
        middleIndex: number,
        rightIndex: number,
        compare: Comparator<T>
      ): void => {
        // ...
        // Implement the main merging logic with galloping mode
      }
      
  4. Main timSort Function:

    • Step 1: Calculate the minimum run length.

    • Step 2: Identify runs (ascending or descending) and sort them using insertion sort.

    • Step 3: Push identified runs onto a stack and ensure they maintain the size invariant by merging appropriately.

    • Step 4: Merge all remaining runs to produce the final sorted array.

      export const timSort = <T>(arr: T[], compare: Comparator<T>): void => {
        const length = arr.length
      
        // Function to identify runs and sort them
        const findRunsAndSort = (start: number, end: number): void => {
          // ...
        }
      
        // Function to calculate minimum run length
        const minRunLength = (n: number): number => {
          // ...
        }
      
        // Function to push a new run onto the stack
        const pushRun = (
          runs: { start: number; length: number }[],
          start: number,
          length: number
        ) => {
          // ...
        }
      
        // Function to merge two adjacent runs
        const mergeAt = (runs: { start: number; length: number }[], i: number) => {
          // ...
        }
      
        // Function to force collapse all remaining runs
        const mergeForceCollapse = (runs: { start: number; length: number }[]) => {
          // ...
        }
      
        // Function to ensure runs maintain the size invariant
        const mergeCollapse = (runs: { start: number; length: number }[]) => {
          // ...
        }
      
        // Determine the minimum run length
        const minRun = minRunLength(length)
        let runs: { start: number; length: number }[] = []
      
        // Find runs and sort them
        let start = 0
        while (start < length) {
          // Identify runs (ascending or descending) and sort them
          let end = start + 1
          if (end < length && compare(arr[end - 1], arr[end]) <= 0) {
            // Ascending run
            while (end < length && compare(arr[end - 1], arr[end]) <= 0) {
              end++
            }
          } else {
            // Descending run
            while (end < length && compare(arr[end - 1], arr[end]) > 0) {
              end++
            }
            // Reverse descending run to make it ascending
            arr.slice(start, end).reverse()
          }
      
          findRunsAndSort(start, end - 1)
          pushRun(runs, start, end - start)
      
          mergeCollapse(runs)
      
          start = end
        }
      
        // Merge all remaining runs
        mergeForceCollapse(runs)
      }
      

Key Techniques

  1. Hybrid Sorting:

    • Combines insertion sort for small subarrays (runs) and merge sort for larger arrays.
  2. Galloping Mode:

    • Optimizes the merge process by quickly traversing elements when many consecutive elements from one subarray are smaller than those from the other.
  3. Run Identification:

    • Identifies naturally occurring runs (ascending or descending sequences) in the data, which are then sorted and merged.
  4. Stack of Runs:

    • Maintains a stack of runs to be merged, ensuring that the size invariant is maintained for efficient merging.
  5. Insertion Sort:

    • Efficiently sorts small runs within the array.
  6. Adaptive:

    • Adapts to the existing order in the data, making it particularly effective for real-world data which often has partially ordered sequences.

The optimized TimSort provides a highly efficient and robust sorting algorithm for various applications.

sozelfist avatar Jun 18 '24 01:06 sozelfist

Can you have a look @appgurueu?

sozelfist avatar Jun 18 '24 04:06 sozelfist

Thanks for your helpful reviews and suggestions. I will close the PR because I can't go forward on this PR anymore. The algorithm is very complicated in detail and the way you need it is so technically restricted to an open-source project for education.

sozelfist avatar Jun 18 '24 15:06 sozelfist

Thanks for your helpful reviews and suggestions.

You're welcome. Thank you for your PR.

The algorithm is very complicated in detail

Indeed it is. It is probably a bit of an unfortunate choice if you're looking for a particularly "elegant" algorithm to implement.

the way you need it is so technically restricted to an open-source project for education

It is crucial for education that Timsort (as well as other algorithms) are represented properly; Timsort specifically lives from all these little observations enabling it to save some real-world time on real-world data.

These aren't arbitrary technical restrictions, these are defining characteristics of Timsort. If you call something Timsort, it should be Timsort.

If we were to misrepresent an oversimplified version of Timsort as Timsort, the issue would only get worse down the line: Each time someone makes a port of this implementation and decides to cut some corners themselves, the algorithm would stray more and more from the original.

appgurueu avatar Jun 18 '24 18:06 appgurueu

From a reviewer's perspective, I appreciate the steps you are taking to enhance the quality of the PRs. It helps the learner or anyone using these resources for their learning (whether studying or working) right away.

sozelfist avatar Jun 19 '24 01:06 sozelfist