exmeso icon indicating copy to clipboard operation
exmeso copied to clipboard

Distinct option doesn't work with a small size of file

Open MamoruAsagami opened this issue 9 years ago • 0 comments

When and an input file is small enough to fit in a single chunk, distinct option doesn't remove duplicated items.

The reason seems because ExternalMergeSort#mergeSortedChunksNoPartialMerge(List<File> sortedChunks) doesn't honor config.distinct as you see below.

    if (sortedChunks.size() == 1) {
        File sortedChunk = sortedChunks.get(0);
        return new ChunkFile<T>(sortedChunk, serializer, comparator, config.cleanup);
    } else {
        List<ChunkFile<T>> cfs = new ArrayList<ChunkFile<T>>(sortedChunks.size());
        for  (File file : sortedChunks) {
            cfs.add(new ChunkFile<T>(file, serializer, comparator, config.cleanup));
        }
        return new MergeSortedIterator<T,ChunkFile<T>>(cfs, comparator, config.distinct);
    }

It works fine if you eliminate the above optimizing if-true section.

MamoruAsagami avatar Jan 23 '17 13:01 MamoruAsagami