perf-challenge6 icon indicating copy to clipboard operation
perf-challenge6 copied to clipboard

Problem definition

Open chadbrewbaker opened this issue 3 years ago • 2 comments

The output of this sort is correct? Just making sure I understand the problem definition.

tr -s ' \t' '\n' < data/small.data | sort | uniq -c | sort -ns

As a hilarious aside - I think I found some issues in /usr/bin/sort on OSX https://opensource.apple.com/tarballs/text_cmds/ 😂

-- update -- This is close - still need to secondary alpha sort

import sys
from collections import Counter

fpath = sys.argv[1]

with open(fpath, 'r') as f:
    data = f.read()
  
freq = Counter(data.split())

result = freq.most_common()

chadbrewbaker avatar May 28 '22 19:05 chadbrewbaker

For bash I think you can use this one as a reference: https://github.com/juditacs/wordcount/blob/master/bash/wordcount.sh

I checked that the baseline output matches with this solution: https://github.com/juditacs/wordcount/blob/master/python/wordcount_py3.py

dendibakh avatar May 28 '22 22:05 dendibakh

Thanks. I think this is the correct Python using the Counter class. Runs about 2x as slow as the original C++ on my M1.

import sys
from collections import Counter

fpath = sys.argv[1]
with open(fpath, 'r') as f:
    data = f.read()
freq = Counter(data.split())
result = sorted(freq.most_common(), key=lambda x: (-x[1], x[0]))

chadbrewbaker avatar May 28 '22 22:05 chadbrewbaker