tgcf icon indicating copy to clipboard operation
tgcf copied to clipboard

Prevent forwarding duplicate content

Open tissole opened this issue 5 years ago • 6 comments

I realize now that I have a lot of duplicates in my private channel, God knows how many. It is possible to prevent future files to be forwarding from the source channel if they are already in the destination channel?

There is a way to identify all duplicate that already exists in a channel and remove them, either automatically or in an interactive manner?

tissole avatar Apr 20 '21 12:04 tissole

If files are already forwarded, nothing can be done.

But for future messages, it can be done.

A plugin can be implemented to prevent duplicate files. This plugin will store the hash of every file. If same hash is found again, it will not forward.

aahnik avatar Apr 23 '21 18:04 aahnik

Great! Actually, with this enhancement, I can find a workaround. I make a new private channel and forward all messages there from my old channel. And use a bot in case TG gets nervous :) Just for curiosity, what prevents those messages already forwarded to be deleted if duplicates exist? I imagine that a script could index the files and their hashes from a channel in a database and start deleting files with identical hashes. But I'm not a programmer and I do not understand how TG works and maybe I'm wrong.

P.S. Congrats on 100 stars! This script is becoming more powerful day by day.

tissole avatar Apr 24 '21 11:04 tissole

waiting for this plugin ❤️🔥

Guru-25 avatar Jul 22 '21 19:07 Guru-25

I'm a n00b at python and could not write a full-blown plugin, but i managed to hack the OCR plugin to repourpose it for deduplication. I'm aware it is very badly written and could have bugs, i only spend a couple of hours on it, but seems to work at first sight. This is the file that must replace the ocr plugin if somebody is interested. https://pastebin.com/Am8yVtvE

vtmocanu avatar Dec 12 '22 16:12 vtmocanu

My code does not seem to work very good, probably it remove only some duplicates, but i found a python app that works fine, maybe you can check the implementation: https://github.com/mayiprint/tg-remove-duplicate-file

This one also seems to get the job done, tg-remove-duplicate-file did not catch all of them: https://github.com/OxMohsen/uniquify-bot

vtmocanu avatar Jan 04 '23 13:01 vtmocanu

Please add this feature this will prevent unnecessary overhead of forwarding same file multiple times

cleanerspam avatar Jan 08 '24 11:01 cleanerspam