anew icon indicating copy to clipboard operation
anew copied to clipboard

Add options for case-insensitivity, blank lines, output file, counts, and backup to anew

Open NullifiedSec opened this issue 8 months ago • 6 comments

Hi Maintainers,

This PR enhances the anew uniqueness tool by adding several commonly requested features to improve its flexibility and usability in various scenarios. The goal was to add practical options without significantly increasing the tool's complexity.

Motivation:

The base anew tool is useful for ensuring unique lines, but real-world use cases often require more nuanced handling:

  • Ignoring case differences (e.g., 'apple' vs 'Apple').
  • Skipping blank lines from input.
  • Directing unique output to a separate file instead of modifying the input file.
  • Getting feedback on how many lines were processed/added.
  • Safeguarding the original file when modifying it in place.

Changes Introduced:

This PR adds the following command-line options to anew:

  1. -i (Ignore Case): Performs case-insensitive comparisons when checking for existing lines and duplicates from stdin.
    # 'Apple' from stdin won't be added if 'apple' exists in existing.txt
    echo "Apple" | anew -i existing.txt
    
  2. -B (Ignore Blank Lines): Skips processing (and potentially adding) blank lines received from standard input.
    printf "line1\n\nline2\n" | anew -B existing.txt
    
  3. -o <outfile> (Output File): Specifies a different file to append the new unique lines to. If omitted, behavior remains the same (appends to the [input_filename] if provided). This allows merging unique lines into a new destination.
    # Read existing lines from check.txt, append new unique lines from stdin to unique_lines.txt
    cat new_stuff.txt | anew check.txt -o unique_lines.txt
    
    # Read only stdin, append unique lines to a new file
    cat new_stuff.txt | anew -o unique_only_from_stdin.txt
    
  4. -c (Counts): Prints statistics (lines read, duplicates found, blanks skipped, lines output/written) to stderr upon completion.
    cat new_stuff.txt | anew -c existing.txt
    
  5. --backup[=<SUFFIX>] (Backup): Creates a backup copy of the [input_filename] before modification. This only takes effect if output is being written back to the same file specified as [input_filename] (i.e., -o is not used or -o points to the same file).
    • If --backup is used without a value, the suffix .bak is used.
    • If --backup=<SUFFIX> is used, the specified SUFFIX is appended to the filename (e.g., --backup=.orig).
    # Creates existing.txt.bak before appending
    cat new_stuff.txt | anew --backup existing.txt
    
    # Creates existing.txt.timestamp before appending
    cat new_stuff.txt | anew --backup=.timestamp existing.txt
    

Internal Improvements:

  • Refactored flag handling into a Config struct.
  • Added a Stats struct for collecting counts.
  • Introduced a normalizeLine helper function to handle trimming and case-folding consistently.
  • Improved error handling around file operations (distinguishing ErrNotExist, checking scanner errors).
  • Used bufio.Writer for potentially more efficient file appends.
  • Added basic argument count validation.
  • Updated usage information.

Testing:

Manual testing was performed with various combinations of flags, input files (existing, non-existing), stdin content (with duplicates, blanks, case variations), and output scenarios (in-place, -o, dry-run).

Request for Review:

Please review the changes for correctness, adherence to project style, and potential edge cases. Particular attention to the logic for -o, --backup, and the interaction between -i, -t, and -B would be appreciated.

Thanks for considering this contribution!

NullifiedSec avatar Apr 17 '25 19:04 NullifiedSec

good stuff. thanks. hoping it merged soon.

rasheedmhd avatar May 04 '25 09:05 rasheedmhd

good stuff. thanks. hoping it merged soon.

thanks !

yeah i also hope it gets merged. maybe he is busy. i am thinking of continuing a separate fork if it doesn't get merged in 4 months

NullifiedSec avatar May 26 '25 21:05 NullifiedSec

@NullifiedSec are u planning for separate fork?

noob6t5 avatar Jul 30 '25 10:07 noob6t5

@noob6t5, I'm thinking of forking this project since tomnomnom seems inactive. I plan to keep it updated and add new features. Would you like to contribute? I'd also appreciate your opinion on something: given that the code is now almost unrecognizable from the original due to a complete restructure, should I create a new repository or stick with a fork?

NullifiedSec avatar Aug 07 '25 17:08 NullifiedSec

@NullifiedSec I have some updated features for it using personally if it's okay for all I will PR in your forked repo or new repo , I think both is fine but forking this will weight more showing respect to author rather then creating new tool's.

But fully dedicating and updating is quite impossible right now as I'm Crushed with some tool's of mine 😅

noob6t5 avatar Aug 10 '25 03:08 noob6t5

@noob6t5 i continued this as a separate repository maintaining a fork is kinda complex for me though i mentioned the original repo

here is my repo if you want to contribute

https://github.com/NullifiedSec/onew/

NullifiedSec avatar Aug 12 '25 06:08 NullifiedSec