ngPost icon indicating copy to clipboard operation
ngPost copied to clipboard

[Feature] Allow par2 without --compress

Open AssRap3r opened this issue 4 years ago • 45 comments

Issue: @animetosho has declared jihad against rar (https://github.com/animetosho/Nyuu/wiki/Stop-RAR-Uploads) and urged his usenet uploaders to follow his lead and become mujahid...

Request: ngPost should allow you to generate par2 files without having to compress first

Problems:

  1. --gen_name is reliant on --compress, without it the nzb subject is not obfuscated
  2. some releases (eg: extracted blu-ray/dvds) have a folder structure which needs to be kept in-tact, compression should be forced if found to have folders like BDMV and VIDEO_TS present (configurable?)

AssRap3r avatar Mar 05 '21 11:03 AssRap3r

well I've made par2 only usable when compressing cause on Linux, par2cmdline wasn't supported to generate them in another folder than where the files are. So I'd have to copy the source files in the temporary folder to generate the files which I found stupid... I know there isn't this issue with parpar. Maybe we should just fix par2cmdline maybe it has been done already, don't know... I might disagree with @animetosho, I believe cutting file is way more efficient in term of memory usage client side. I think most downloaders have to download all the articles to then do the yEnc decoding. If you don't cut your 25GB post, this means you gonna need 50GB space. That's kind of an issue... Cutting your post in let's say 250MB archives means you only need a spare 250MB (or maybe 500MB) space more than the 25GB. Am I wrong? I believe nzbget was doing that... it would need to be confirmed. Appart from that, you don't need to use rar, it could just be splitting the source, but what if you want to post several files at once in your post? Edit: If you don't like rar, ngPost also support 7z...

mbruel avatar Mar 05 '21 23:03 mbruel

well I've made par2 only usable when compressing cause on Linux, par2cmdline wasn't supported to generate them in another folder than where the files are. So I'd have to copy the source files in the temporary folder to generate the files which I found stupid...

it looks like par2cmdline can do that if you specify the basepath:

me@instance-1:/mnt/disks/disk1/test2$ ls
100MB.bin

attempting to create test-new2.par2 in /tmp/files directory from files in disk1/test2 directory (failure)

me@instance-1:/tmp/files$ par2 c -s640000 -r8  test-new2.par2 /mnt/disks/disk1/test2/*
Ignoring out of basepath source file: /mnt/disks/disk1/test2/100MB.bin
You must specify a list of files when creating.

the same command but with basepath specified (successful) me@instance-1:/tmp/files$ par2 c -s640000 -r8 -B/mnt/disks/disk1/test2/ test-new2.par2 /mnt/disks/disk1/test2/*

me@instance-1:/tmp/files$ ls /tmp/files
test-new2.par2          test-new2.vol01+2.par2  test-new2.vol07+6.par2
test-new2.vol00+1.par2  test-new2.vol03+4.par2

I think most downloaders have to download all the articles to then do the yEnc decoding. If you don't cut your 25GB post, this means you gonna need 50GB space.

I don't know about this, I thought programs like sab/nzbget would download x amount of data and then output that into the destination file, then repeat until it's finished. I'll do a test upload over the weekend and see how it works out.

Appart from that, you don't need to use rar, it could just be splitting the source, but what if you want to post several files at once in your post?

for a grouped post like a season pack I'd want a par2 set for each file, but understandably there's not going to be a huge amount of configurable options because there are so many scenarios.

AssRap3r avatar Mar 06 '21 01:03 AssRap3r

I think most downloaders have to download all the articles to then do the yEnc decoding

No, they decode whilst they're downloading. There's no reason to preserve yEnc encoded articles. It's the same with uploading - you don't need to yEnc everything before you start posting articles - you do it as you go.

If you don't cut your 25GB post, this means you gonna need 50GB space

You've got it the wrong way around. If you cut your 25GB post, you need 50GB of space. This is true for both uploading and downloading.

In the upload case, you need your original 25GB on disk, then you need to create 25GB set of RARs, meaning you're consuming 50GB of space until the upload completes and the RARs can be deleted.
In the download case, you need to download the 25GB set of RARs, then need an extra 25GB of space to hold the extracted file, before you can remove the RAR parts. Modern downloaders have a "direct unpack" option which can lessen the impact of this, assuming a repair never needs to be performed.

Of course, it's much simpler if no unpacking needs to be performed at all. Basically, if you're downloading 25GB, you just write 25GB to disk with no requirement for temporary unpacking storage - nice and simple.

but what if you want to post several files at once in your post?

You can just upload them as separate files, but if you want to force them together, archiving it may be your only option.

The key thing I'm trying to point out is that RAR should not be considered a requirement. There are cases to use it, but there are many more where it makes no sense.

animetosho avatar Mar 06 '21 01:03 animetosho

The key thing I'm trying to point out is that RAR should not be considered a requirement. There are cases to use it, but there are many more where it makes no sense.

You are right, it shouldn't maybe be a requirement and from a purely technical point of view I even agree to almost all of your arguments against it. However, let's be honest and acknowled what kind of binaries probably 99% of the usenet uploads consist of. And these uploads require at least header obfuscation but nowadays even encryption in order to prevent them to be taken down immediately by DMCA. As user igadjeed pointed out in the reddit discussion, encripted uploads are increasing and probably will become the defacto standard soon.

So, splitted/encrypted rar archives are here to stay. I don't think that will ever change....

Tensai75 avatar Mar 06 '21 06:03 Tensai75

I have my own thoughts on whether encryption makes sense most of the time, but I'll leave that for another discussion.

Regardless, the fact that people use encryption is not a reason to keep using RAR for unencrypted uploads.

Having seen how Usenet operates, I agree that split RARs will likely stay. My point is that it'll be due to misunderstanding/ignorance/laziness reasons more than anything technical.
Nevertheless, I think the discussion is worth having. The world can't improve if everyone just accepts things the way they are.

animetosho avatar Mar 06 '21 10:03 animetosho

@AssRap3r hum I didn't know about this basepath argument. Alright, I may add it back then when I've some time, probably for the v5.

@animetosho Are you sure downloaders decode as it goes? I didn't check the code but I doubt it. In my old patched version of nzbget (version: 16.3) you define the Directory to store temporary files where it seems all the articles of the current file are stored. I believe it can't assume that the articles will arrive in the good order so you can't really directly fill the file on the fly, it would be completely inefficient to use Mutex (multi-threaded download) and seek the position on the shared file... So I'm quite convinced they just decode all the articles on the fly (yEnc decoding) but they store them all in temporary files to only at the end join them together in the right order in the destination file. Did you check they don't do like that?

In the upload case, you need your original 25GB on disk, then you need to create 25GB set of RARs, meaning you're consuming 50GB of space until the upload completes and the RARs can be deleted.

true but this is not really problematic, we're more interested in the user case (so for the downloader)

In the download case, you need to download the 25GB set of RARs, then need an extra 25GB of space to hold the extracted file, before you can remove the RAR parts. Modern downloaders have a "direct unpack" option which can lessen the impact of this, assuming a repair never needs to be performed.

well true, BUT you don't need to unpack necessarily straight away. you could choose to not unpack and forward the files. you could also extract later on another place using another process or even another machine. this is not possible if it is the grabber that do the job with the articles. You then have more flexibility in my opinion.

The key thing I'm trying to point out is that RAR should not be considered a requirement. There are cases to use it, but there are many more where it makes no sense.

True rar shouldn't be a "requirement"... I started using ngPost with the articles' obfuscation (articles' message-id becoming a UUID) and thus posting my files without cutting them but the downside is that you can't find the posts on public indexers and thus you need to make sure to not loose the nzb files...

Also cutting files with rar is still used by the scene as they don't use par2 redundancy volumes but only sfv files with CRC checks during FTP transfers and automatic re-transfer in case of errors on an archive. So it has some real interest to cut files in small pieces as you don't need to re-send the whole file. That's true that this is not needed on Usenet where in general we use par2. But as we generally need a password to avoid DMCA...

So we could develop an opensource low cpu usage encryption cutting app but rar does perfectly the job and 7z too. Why bother and force all downloading softwares to make a new version supporting it?... It's a shame tar doesn't have a password option...

mbruel avatar Mar 07 '21 13:03 mbruel

Are you sure downloaders decode as it goes?

Yes, NzbGet case documented here.

Thanks for raising this though - it's likely another common misconception I can add to the article.

I believe it can't assume that the articles will arrive in the good order so you can't really directly fill the file on the fly

The downloader completely controls the order in which articles are requested. It doesn't control the order in which they arrive, though this is easily handled with a re-ordering buffer/window.
This sort of behaviour is done throughout computing in many ways. For example, TCP, which NNTP runs on top of, needs to reassemble a sequential stream from packets which can arrive out of order. In other words, dealing with this is well established and has been widely studied, deployed and isn't unusual.

it would be completely inefficient to use Mutex (multi-threaded download) and seek the position on the shared file

Quite the opposite. Mutexes have costs, but they're insignificant to any I/O cost.

But even if we assume that reassembling downloads into a sequential write isn't possible and we must handle it in a random fashion. Random seeking isn't nice, but it's much better than writing each article to a separate file.
Every file you create invokes filesystem overhead (which includes syscall overhead, file handle management, ACL checks etc). And if you thought that at least allows you to avoid seeking, well... no. Creating a file means changing the filesystem structures, meaning a seek to write there. NTFS, for example, keeps a copy of the MFT in two locations (to cater for bad sectors), so that's two additional seeks just to create a file. And that's not even going into the cost of journaling all these changes.

And even if we assume that filesystem operations are completely free, there's still no benefit to writing it to separate files. If your writing process writes randomly ordered articles to disk sequentially, then you're invoking random seek behaviour when you construct the full file during the read process (and adding additional I/O writes on top of that). In other words, you're just changing when the random access occurs, whilst being much less performant.

To give hard evidence of the point, consider torrent clients. Torrent clients generally retrieve pieces in random order, and even they don't write pieces to separate files, because it simply doesn't make sense.

Of course, realistically, even if you invoke a seek for every write, articles are going to arrive mostly in order, so the OS's write buffers and elevator scheduling (on top of NCQ) will do a good job at avoiding most physical seeks anyway.

Also, if you're worried about mutexes (which you shouldn't), there's an easy workaround - just mmap the file and write directly. As a bonus, writing directly to mapped kernel pages saves the user->kernel buffer copy.
(this strategy may need some care if cachelines are shared across threads, but it's doable if you're determined enough)

BUT you don't need to unpack necessarily straight away. you could choose to not unpack

Let's be frank: 99% of people are going to unpack straight after the download.

not unpack and forward the files

Posting unpacked files doesn't hinder this.

you could also extract later on another place using another process or even another machine

You could, but you wouldn't need to if the files weren't packed in the first place.

this is not possible if it is the grabber that do the job with the articles. You then have more flexibility in my opinion.

It's not possible, because it's completely unnecessary. But I totally don't understand your point on how forcing users to chose how they do unpacking offers more flexibility over not having to do it in the first place. It's like saying "we're flexible with our breakfast toppings as we offer our guests the option of cow poop or chicken poop" as if it's some moral victory over places that don't force guests to eat poop.

the downside is that you can't find the posts on public indexers

Public indexers find and list them fine. I've been posting unobfuscated non-RAR'd files for years, and they're all shown.

Also cutting files with rar is still used by the scene

The scene is stuck on 90s technologies which make no technical sense today other than legacy reasons. Scene distribution occurs without Usenet, so this "real interest" you talk about only matters if you're distributing over FTP, not Usenet.

But as we generally need a password to avoid DMCA...

As hinted earlier, I question that too, but I'll leave the discussion for another time.
If you're going to use a password, then yes, using RAR makes sense. I specifically listed this as a reason as to why you may want to use RAR.

So we could develop an opensource low cpu usage encryption cutting app

If we were to design it properly, I'd probably implement encryption as a yEnc extension or similar. But as you say, such a thing doesn't exist and RAR encryption is supported and works well.

The point I'm making is that for many cases, such as when a password is not used, RAR is unnecessary.
Note that I'm not saying that you should change anything in your uploader (that's @AssRap3r's request) - I'm just pointing out that RAR makes no sense a lot of the time.

animetosho avatar Mar 07 '21 23:03 animetosho

As hinted earlier, I question that too, but I'll leave the discussion for another time.

Well, if you actually do have a better idea on how to protect Usenet uploads from DMCA takedowns without encryption while still having them indexed correctly by the popular indexers (nzbindex.com, binsearch.info, etc.) I would gladly like to hear your thoughts about this topic.

Tensai75 avatar Mar 08 '21 07:03 Tensai75

@animetosho you made a clear point. It was indeed a bad idea to believe it could be faster to write articles in different files to finally join them all together in the final file.

I had a memory that nzbget was doing it ages ago, having several tmp files left in the temporary directory when I killed it but maybe I'm saying crap…

Sure locking threads with a Mutex is not the end of the world and should be indeed faster than initializing an I/O on a new file. That’s true TCP is definitely doing that kind of thing using a buffer and controlling its window. I’ve never looked in detail those algorithms.

Torrent is indeed a great example. I suppose continuous parts are written into a buffer and then flushed directly in the destination seeking the right position. Am I right?

I never heard either about elevator scheduling, NCQ or mmap. That’s too low level for me… I’ll have a look when I’ve some time, thanks for sharing.

I guess we agree rar is not needed when we don’t want to encrypt. But when we do and still wish to have the post visible on public indexers, I guess we have no choice… like @Tensai75 asked, do you see another solution?

About having encryption at the yEnc level, I think it could be too much, no need to encrypt all the articles… maybe at the nzb level, some algorithms that would just permute and/or alter yEnc articles could do the job no?

mbruel avatar Mar 08 '21 19:03 mbruel

I had a memory that nzbget was doing it ages ago, having several tmp files left in the temporary directory when I killed it

That may have been the case in the past - unfortunately I've found a lot of Usenet software is stuck with outdated techniques/ideas, so it doesn't sound that unusual.

I suppose continuous parts are written into a buffer and then flushed directly in the destination seeking the right position. Am I right?

Depends on application. Many do use a combining buffer to sequentialize writes, though for torrents, data generally arrives in random order so its effect may not be as pronounced as it would for a mostly sequential download.
libtorrent's strategy is documented here.

maybe at the nzb level, some algorithms that would just permute and/or alter yEnc articles could do the job no?

That doesn't really sound like encryption to me. Though it could be as effective.

Consider this: with your knowledge of how Usenet works, if you were tasked with issuing takedowns for a particular set of works, how would you go about it?

animetosho avatar Mar 09 '21 09:03 animetosho

Hey,

Are there examples of uncompressed NZBs but obfuscated that also have multiple files and a directory structure? I think I have not seen the first (someone who corrects me), but as @animetosho says in his article, let's look at the future.


I propose a midpoint between without RAR and fully encrypted, possible solutions to the problems presented @AssRap3r:

For point 1, a random subject could be generated (do even omitting the yEnc subject format?), then inside each <file> tab add an <name> item that contains the original name from the file, even its directory structure.

Eg:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE nzb PUBLIC "-//newzBin//DTD NZB 1.1//EN" "http://www.newzbin.com/DTD/nzb/nzb-1.1.dtd">
<nzb xmlns="http://www.newzbin.com/DTD/2003/nzb">
  <file poster="[email protected]" date="1015256171" subject="[1/1] - &quot;RANDOMNAME.RANDOMEXT?&quot; yEnc (1/5) ...">
    <name>my-original-filename-that-can-use-any-charset.iso</name>
    <groups>
      <group>alt.binaries.boneless</group>
      ...
    </groups>
    <segments>
      <segment bytes="716800" number="1">A@AAA</segment>
      ...
    </segments>
  </file>
  <file poster="[email protected]" date="1015256171" subject="[1/1] - &quot;RANDOMNAME.RANDOMEXT?&quot; yEnc (1/5) ...">
    <name>myfolder/my-original-filename-that-can-use-any-charset.txt</name>
    <groups>
      <group>alt.binaries.boneless</group>
      ...
    </groups>
    <segments>
      <segment bytes="716800" number="1">A@AAA</segment>
      ...
    </segments>
  </file>
  <file poster="[email protected]" date="1015256171" subject="[1/1] - &quot;RANDOMNAME.RANDOMEXT?&quot; yEnc (1/5) ...">
    <name>my_recovery.par2</name>
    <groups>
      <group>alt.binaries.boneless</group>
      ...
    </groups>
    <segments>
      <segment bytes="716800" number="1">A@AAA</segment>
      ...
    </segments>
  </file>
  <file poster="[email protected]" date="1015256171" subject="[1/1] - &quot;RANDOMNAME.RANDOMEXT?&quot; yEnc (1/5) ...">
    <name>my_recovery.vol00+01.par2</name>
    <groups>
      <group>alt.binaries.boneless</group>
      ...
    </groups>
    <segments>
      <segment bytes="716800" number="1">A@AAA</segment>
      ...
    </segments>
  </file>
</nzb>

Then the adoption of this implementation would be in charge of each downloader, for those who have extension support, we could implement ourselves.

Advantages:

  • It allows downloading only some files
  • It allows creating a directory structure
  • It allows to completely omit the standard subject format
  • File names are not disclosed in Usenet
  • It can be combined with other existing obfuscation methods

Disadvantages:

  • Do not be encrypted, so anyone can read the first X bytes of the article and try to deduct the content or compare it with a signature base (is played with the probability?)

For point 2, an ISO image could be used that avoids damaging the particular properties of files/folders.

BakasuraRCE avatar Mar 09 '21 14:03 BakasuraRCE

For point 1, a random subject could be generated (do even omitting the yEnc subject format?), then inside each <file> tab add an <name> item that contains the original name from the file, even its directory structure.

Well, this only works if you have the original NZB file created by the upload tool. The NZB files created by the indexers will miss the <name> information because they can only index the files based on the subject information.

However, you don't even need this when using par2. After the generation of the par2 files and before starting to upload you can randomly change all the file names (even of the recovery files and including the file extensions). Only the actuall par2 file needs to keep its par2 extension for the automatic renaming with par2 repair. Because after the download, the renamed files will then be automatically restored with the original filenames during the par2 repair step based on the information stored in the par2 file.

However, in order to have the randomly named files indexed by the indexers together as one upload, you need of course to prefix all <file> subjects with a long enough and identical identifier (random or descriptive) for this upload (I honestly don't know how similar the subjects have to be for the indexers to be indexed together as one upload).

And this does of course not work for folder structures. But for folder structures I still think it is much better and safer to keep them in a container.

And altough this will obfuscate the real file names and extensions of the uploaded files while the uploads nevertheless can be indexed by the indexers as desired, this does also still not "protect" the uploads from DMCA takedowns.

Tensai75 avatar Mar 09 '21 16:03 Tensai75

Depends on application. Many do use a combining buffer to sequentialize writes, though for torrents, data generally arrives in random order so its effect may not be as pronounced as it would for a mostly sequential download. libtorrent's strategy is documented here.

well it's not really detailed how they optimize their flushing. I imagine that even for torrents there is a minimum size of block that can be downloaded from any source.

Consider this: with your knowledge of how Usenet works, if you were tasked with issuing takedowns for a particular set of works, how would you go about it?

Well to issue takedowns for a post you need to know what it is to prove it is violating copyrights. So the question is the other way around. How could you make sure that a post is not scanned. Nowadays I think that the only thing that is done is just using bots that index Usenet (like public indexers) and check the name of each post and if they are compressed check if they can get read its header to check the filenames. Nobody has the computing power to try to bruteforce each encrypted compressed post, there is just too many of them and you don't know what you're looking for. I believe nobody can also check the real content of the files even text file, or music, movie... So for those who don't want their posts scanned, you just need a minimalist encryption that would avoid anyone to take time trying to bruteforce. That is why I'm thinking encrypting all articles is unecessary. You could just encrypt only 1 and it could be enough as the attacker would not know which one. Otherwise some permutations, combinations or reversible alteration of Articles could do the job.

@BakasuraRCE well I think you don't get the use case. What most people are interested in is to keep a database with this triplet: (original name of the post, obfuscated name of the post, encryption password). This way you can generate the nzb from any public indexer and but only people having the password can see the content. Adding a name field in the nzb produced by the poster would break this. I mean, you wouldn't be able to generate the same nzb from a public indexer. In that case, you can just use ngPost Article obfuscation and don't care about encrypting as nobody would be able to find the post if they don't have the original nzb... but again that is not the use case...

So what we're talking about is a way to just protect posts that are visible on public indexer. And if we wish to not use rar (or 7z) to add an encryption layer, how could we do!?!

mbruel avatar Mar 09 '21 19:03 mbruel

@BakasuraRCE as others have pointed out, unfortunately, indexers generally don't do a good job with handling multiple files which should belong together.
If you don't want to rely on PAR2 based renaming, you could perhaps fudge the subject whilst keeping the correct name in yEnc headers. However, whilst this might cross the line for indexers, I don't know how well this scheme is supported amongst downloaders.

@mbruel

I imagine that even for torrents there is a minimum size of block that can be downloaded from any source.

Yes, this is the selected piece size of the torrent.

you need to know what it is to prove it is violating copyrights

Small correction: takedown notices generally don't require such proof. A takedown notice is merely a claim which may or may not be accurate. Typically it's the role of the counterparty to point out inaccuracies, which, in this case, would be the Usenet provider (who often has little reason to file a counter notice, even if the claim is inaccurate).

the only thing that is done is just using bots that index Usenet

Interesting, though that sounds like a fair bit of effort as you've suggested. Is there perhaps an easier way?
It doesn't look like every Usenet downloader runs their own index - perhaps we can leverage something from what they do?
(by the way, I'm referring to posts aimed at distribution, not personal backups)

animetosho avatar Mar 09 '21 22:03 animetosho

Small correction: takedown notices generally don't require such proof. A takedown notice is merely a claim which may or may not be accurate. Typically it's the role of the counterparty to point out inaccuracies, which, in this case, would be the Usenet provider (who often has little reason to file a counter notice, even if the claim is inaccurate).

true but this is theoretic... In practice, they can't make such claim if it is not obvious. Otherwise 99% of Usenet binaries would have been claimed. When I developed and tested ngPost, especially the integration of 7z I got a DMCA claim on one of my post and so got banned from posting from my provider cause I didn't encrypt the header and accidentally uploaded an episode of a TV show. So even if the the archive was encrypted, they could see (in clear) what was inside and with this they raised a claim. I guess they have their own database of what they want to protect and check filenames against it, even within the archive if they can easily. Probably they could also extract unprotected material and have some video or music analysis. Or even pay some people to check them (for major distribution) don't know...

Interesting, though that sounds like a fair bit of effort as you've suggested. Is there perhaps an easier way? It doesn't look like every Usenet downloader runs their own index - perhaps we can leverage something from what they do?

Not sure what you mean by every Usenet downloader runs their own index. I'm talking about major distribution who I guess are the ones who are interested in raising claims. An indexer is quite easy to develop. just some regexp... then behind some scripts that would try to extract, compare the content (filename or actual content) to a database... Easy too. You just need few servers with storage and/or a bit of money.

I'm referring to posts aimed at distribution, not personal backups

I've stopped defending it, but Usenet shouldn't be used for personal backups... In my opinion those who do it are just fucking stingy and don't get the spirit of what is/was Usenet...

mbruel avatar Mar 09 '21 23:03 mbruel

In practice, they can't make such claim if it is not obvious

Why not?

Otherwise 99% of Usenet binaries would have been claimed.

I can see a bunch of other reasons why this may be the case, but let's ignore this for now.

Probably they could also extract unprotected material and have some video or music analysis. Or even pay some people to check them (for major distribution) don't know...

External parties are generally used, but to keep things focused, I'll let you know that they certainly aren't doing such complex level analysis.

Not sure what you mean by every Usenet downloader runs their own index

The majority of people who use Usenet - i.e. downloaders. The vast majority aren't running their own indexer.

An indexer is quite easy to develop. just some regexp

I think it's a fair bit more complicated than that, and that would do poorly against obfuscated content.

Most downloaders use a third party indexer, which goes to the effort of handling deobfuscation. Wouldn't you think it'd be easier to just scrape these indexers instead of running one yourself?

animetosho avatar Mar 10 '21 01:03 animetosho

Why not?

it would just be an abuse... you need some sort of reason to make a claim. otherwise it is kind of censorship especially that it would be done mainly by a certain class / lobby or influential companies.

I can see a bunch of other reasons why this may be the case, but let's ignore this for now.

I'm quite interested, please say a few, but as I said above without reason in my opinion it would be censorship no?

External parties are generally used

yeah external parties, cheap labour, why not even paying people for raising DMCA and let anyone report some?

I'll let you know that they certainly aren't doing such complex level analysis.

well they at least check filenames within unencrypted archives. as I said I got one myself during the development of ngPost although I didn't publish my nzb... Truly you don't need much resources to be able to do it...

The majority of people who use Usenet - i.e. downloaders. The vast majority aren't running their own indexer.

yeah downloaders don't need to index Usenet. People who wants to protect copyrights by raising DMCA do. I suppose most providers also do it (at least the one I'm using that provide its own great downloader with an option to search and browse Usenet like any public indexer)

I think it's a fair bit more complicated than that, and that would do poorly against obfuscated content.

There are opensource indexers like nZEDb. I've never tried it. But anyway, indexing is not complicated and if you can even start from an established project it really doesn't take a big effort. As I said, there can't be enough resources to try to bruteforce encrypted resources. Not sure what kind of obfuscation you're talking about. You can't do anything on Article obfuscation. It's not hard to index stuff posted on different groups and/or with different email (what most public indexer don't do). There is nothing you can do if the subject of the different volumes (archive) or articles are diversified. But I don't see the interest to do that over fully Article obfuscation...

Most downloaders use a third party indexer, which goes to the effort of handling deobfuscation. Wouldn't you think it'd be easier to just scrape these indexers instead of running one yourself?

yes but in general a private indexer which is kind of a community. Some of them are well known and easy to join. Not sure how/why they didn't get infiltrated and takedown... Probably cause a clone would re-open the next day?

Public indexers are not an issue for anybody as most things nowadays are encrypted and/or obfuscated. Plus anyone can just have its own... They are just handy as I said before to @BakasuraRCE for those who want to keep a database of their posts with the triplet (original name of the post, obfuscated name of the post, encryption password)

PS: I quite regret the time where Usenet was not mainstream and things were neither obfuscated nor encrypted... It's quite a massive waste of storage (/power) nowadays to have each communities duplicating the same data and even in so many different qualities... Plus those using it for personal backups...

mbruel avatar Mar 10 '21 11:03 mbruel

it would just be an abuse [...] otherwise it is kind of censorship

Well, I'll just drop a few articles and let you decide whether anyone would ever use takedown notices for abuse or censorship.

https://torrentfreak.com/universal-censors-megaupload-song-gets-branded-a-rogue-label-111210/
https://torrentfreak.com/epic-games-sues-youtuber-golden-modz-over-magical-fortnite-powers-181012/
https://torrentfreak.com/github-restores-nyaa-repository-as-it-isnt-clearly-preconfigured-to-infringe-210121/
https://torrentfreak.com/dmca-takedowns-remove-perfectly-legal-plex-pages-from-google-210130/
https://torrentfreak.com/overbroad-dmca-takedown-tries-to-remove-dictionary-entries-from-google/

There's plenty more similar stories if you're interested in searching for them. Keep in mind that these are just what's been noticed and reported on - imagine what goes unnoticed...

Also remember that there's generally no legislation around private censorship, but there is around non-compliance with a takedown, so it should be obvious which side hosts are likely to lean towards.

yeah downloaders don't need to index Usenet. People who wants to protect copyrights by raising DMCA do.

Why?

indexing is not complicated
You can't do anything on Article obfuscation

Sounds like you've answered yourself on why it is indeed complicated...

yes but in general a private indexer which is kind of a community

You can make anything a community on the internet - that's not really relevant. A multi-billion dollar industry certainly doesn't lack the resources to enter most public/private indexers if they wish.

This is how I'd approach the problem: register on a few popular indexers that downloaders tend to use. Use the conveniently supplied Newznab API to find the works being targeted, then send the NZBs to the takedown bot. You may even be able to use existing tools like Sonarr to help.
This doesn't require running or maintaining an indexer, hence, much simpler. It also gets around any obfuscation shenanigans that uploaders pull off. If encryption is used, as long as the indexers used can pick them up, I can send takedowns to them as well.
Now perhaps there's some exclusive private indexers that may be difficult to enter without significant effort, but honestly, they don't actually matter, as they'll have hardly any members to be enough of a concern. In short, if most Usenetters can see it, I can send a takedown on it.

well they at least check filenames within unencrypted archives. as I said I got one myself during the development of ngPost although I didn't publish my nzb...

I'd say it's more likely that some other indexer picked up your upload and deobfuscated it, since deobfuscation is what indexers do. The rightsholder just scraped that indexer and sent the takedown. It's unlikely they went to any effort to check the contents of archives at all.

want to keep a database of their posts with the triplet (original name of the post, obfuscated name of the post, encryption password)

Or you could just keep the NZBs.

animetosho avatar Mar 11 '21 10:03 animetosho

Also remember that there's generally no legislation around private censorship, but there is around non-compliance with a takedown, so it should be obvious which side hosts are likely to lean towards.

Indeed... but they can't just takedown the whole Usenet network or even the binary parts or some groups... well I guess they could takedown groups... Probably there is still an interest to let some stuff circulate but not too easily.

Sounds like you've answered yourself on why it is indeed complicated...

well it's a fact you can't index protected post... but you can for the rest. some people still continue to post unprotected stuff that gets taken down few hours or days after...

This is how I'd approach the problem: register on a few popular indexers that downloaders tend to use. Use the conveniently supplied Newznab API to find the works being targeted, then send the NZBs to the takedown bot. You may even be able to use existing tools like Sonarr to help.

true but this means that those private indexers would shutdown if all their post are taken down. An other one more private will then open. So maybe they just let go protected posts? don't know...

I'd say it's more likely that some other indexer picked up your upload and deobfuscated it, since deobfuscation is what indexers do. The rightsholder just scraped that indexer and sent the takedown. It's unlikely they went to any effort to check the contents of archives at all.

maybe yeah... Do you know public indexer that can search inside archives?

Or you could just keep the NZBs.

yeah that's what private indexers do nowadays. but it's more risky for the people running it as they become responsible whereas if you just index stuff you're not directly involved... There are also exclusive decentralised communities where members share their own database and you can use sql requests for searching among all of them (everyone fetch updates of the others, search is done locally ;))

mbruel avatar Mar 12 '21 16:03 mbruel

@AssRap3r I've just made some test and its kind of boring this basepath parameter... What happens if you want to post 2 files without compression but with par2 and if that 2 files comes from different drives... you may not have a common basepath, especially on Windows. On Linux you could just use all the time -B / it kind of seem to work. So I don't know what to do... What about Multipar? Does it have this shitty limitation too? I guess I could check that all the files are from the same disk... that's not hard but sounds like a wart in the code..

for a grouped post like a season pack I'd want a par2 set for each file, but understandably there's not going to be a huge amount of configurable options because there are so many scenarios.

yeah it is defo too much... better to just generate yourself the par2 (for loop in bash or batch script in Windows) and then post the whole directory without neither compression nor par2 generation

mbruel avatar Mar 12 '21 16:03 mbruel

@mbruel Multipar has the same limitation, you can specify a basepath with /d"dir:" but it won't accept multiple entries - you can only use files from the same drive. I guess the question is how much effort do you want to put into doing a workaround of an issue which is a limitation of MultiPar/par2cmdline and not ngPost...

personally I would focus more on trying to get users away from the incompatible software and use what works - either by providing defaults or selectable options in the GUI. the default ngPost.conf has no windows exe path for PAR2_PATH for example, which might confuse people into thinking they can only use MultiPar on windows

If the linux basepath / works then just include that either by default (if it doesn't break anything) or as an example in ngPost.conf. I'm not sure how often people will be posting things from 2 different drives. if it's a common occurrence I'd tell them to organize their media better :X

yeah it is defo too much... better to just generate yourself the par2 (for loop in bash or batch script in Windows) and then post the whole directory without neither compression nor par2 generation

I finally got around to getting inotifywatch to work on my seedbox and cobbled together a little script to do most of what I want. finally have rutorrent auto download/move, no compression, par2 based on size & posting with ngPost . only downside is it doesn't handle folders but ngPost can do that :>

AssRap3r avatar Mar 12 '21 23:03 AssRap3r

they can't just takedown the whole Usenet network or even the binary parts or some groups

Unlikely they'd be able to do it, and besides, they have no reason to either.

true but this means that those private indexers would shutdown if all their post are taken down

"Would" seems a little strong, but yes, an indexer could choose to voluntarily shut down if most of its content was taken down.
In reality, that exact case probably isn't likely as one would expect the indexer admins to react in some way before that occurs.

An other one more private will then open

Some might think that increasing privatisation is the way to go, but the reality is that if that happened, 99% of users would disappear. Yeah, you could go that route, but what's left is significantly less relevant to the world (and in the long run, there may be insufficient economies of scale to keep a lot of the existing infrastructure running).

And if it happened, it'd basically Usenet exclusively for personal backups, except replace 'personal' with 'shared with a small group of friends'.

it's more risky for the people running it as they become responsible whereas if you just index stuff you're not directly involved

At most, you're just shifting things around with no real benefit anywhere.
Someone is still responsible. And that someone could just keep an NZB.

if that 2 files comes from different drives... you may not have a common basepath

That doesn't sound like a common case.
I'd say you could choose not to support it (tell users to fix it if you detect it). Or if you want, symlink them to a temporary folder and PAR2 from there.

ParPar won't complain, but the default behaviour may not what you'd want (--filepath-format=basename might be preferrable).

animetosho avatar Mar 13 '21 09:03 animetosho

@AssRap3r Could you give it a try? For now I've implemented it for the GUI only. I'd still have to modify the command line parsing when I'm sure everything is ok. I've only tested quickly on my env using ParPar, so not much testing... But I don't really have time or interest to do it. What I've coded is this:

  • you can't post folders if no compression
  • on windows files must be from the same drive
  • with ParPar, there is no extra options. I use absolute paths without --filepath-format=basename
  • with par2cmdline I add -B / (or -B "c:\" on Windows)
  • with multipar I'd add /d"c:\"

It would be great if you could test on Windows using par2cmdline and multipar. you should have this error message only ParPar allows to generate par2 for files from different drive. you should consider using it ;) if you use files from different drives.

Also something to test is when using a RAM partition (TMP_RAM with TMP_RAM_RATIO). Normally the par2 should be created there if I didn't do any silly mistake...

Here is the win64 portable version with MultiPar, par2cmdline and a compiled version of ParPar included. Could you test for me please? You would have to edit the config with PAR2_PATH to your extracted path pointing on either parpar or par2j64.exe (by defaullt it will test par2cmdline)

For testing the ram partition I guess you could try on Linux and compile yourself. Or I'll do it myself with the cmd implementation.

mbruel avatar Mar 14 '21 14:03 mbruel

@mbruel auto-append drive name doesn't appear to work for par2cmdline. I didn't see any /d "c:" flags added to the MultiPar command either - but it looks like multipar doesn't need them. it will generate the file regardless of the difference between the output drive and input drive.

PAR2_PATH = C:\ngPost\ngPost_v4.15.beta1-x64\par2.exe PAR2_ARGS = c -l -m1024 -r8 -s768000

[17:12:50.945] Generating par2: C:\ngPost\ngPost_v4.15.beta1-x64\par2.exe c -l -m1024 -r8 -s768000 C:\ngPost\temp/EvozcNfXBZrlQmwKb7eX55/EvozcNfXBZrlQmwKb7eX55.par2 Z:/Audio Tracks/Abominable 2019 Blu-Ray Czech DD5.1 640 kbps.mka
Ignoring out of basepath source file: Z:\Audio Tracks\Abominable 2019 Blu-Ray Czech DD5.1 640 kbps.mka

You must specify a list of files when creating. 

I tried to use the bundled par2j (1.2.9.9) but would always crash when generating, had to copy 1.3.1.5 from MultiPar installation to get it to work. MultiPar and ParPar both worked with TMP_RAM and normal method. par2cmdline worked with compression but not without because of the basepath issue.

AssRap3r avatar Mar 14 '21 17:03 AssRap3r

Hum indeed, Windows is boring, the -BC:\ (or /dC:\) should be before the par2 name... I've corrected it in this version, can you please give it a try and confirm. (I made 2 commits I needed also to change all slashes with backslashes, fucking Windows!)

PS: the basepath is only added when you've more than one file. Not sure if it is also needed when you've only file like in your example. From what I tested no need...

PS2: I've no issues on my win7 VM with the par2j64 I bundled... Is the 1.3.1.5 the latest? I will probably update the binaries anyway ;)

mbruel avatar Mar 14 '21 20:03 mbruel

@mbruel getting messy now :(

when posting one file:

ngPost-beta1: no basepath - error ngPost-beta1: basepath in config - generates, lists created par2 files in debug, adds to upload queue. generated par2 is incorrect: "Data Files" lists the directory\subdirectory\file instead of just the file debug output:

[21:07:40.201] Generating par2: C:\ngPost\ngPost_v4.15.beta1-x64\par2.exe c -B Z: -l -m1024 -r8 -s768000 C:\ngPost\temp/G9EzNi7bAiSoNpARUGkV9e/G9EzNi7bAiSoNpARUGkV9e.par2 Z:/Audio Tracks/Battlestar.Galactica.S01.English.Audio.Description.Pack/Battlestar.Galactica.S01E00.The.Miniseries.Part.1.Audio.Description.AAC2.0.mka

ngPost-beta2: basepath removed from config (rely on exe) - no output file specified in par2 command - files are generated in the source folder (based on filename and not the random ngPost string) and not added to post queue

debug output:

[21:06:25.863] Generating par2: C:\ngPost\ngPost_v4.15.beta1-x64\par2.exe c -l -m1024 -r8 -s768000 Z:\Audio Tracks\Battlestar.Galactica.S01.English.Audio.Description.Pack\Battlestar.Galactica.S01E00.The.Miniseries.Part.1.Audio.Description.AAC2.0.mka

two input files used:

ngPost-beta2: debug log:

[21:15:22.061] Generating par2: C:\ngPost\ngPost_v4.15.beta1-x64\par2.exe c -l -m1024 -r8 -s768000 -BZ:\ C:\ngPost\temp\eZPCMzEa8P6rVAdz1oyhuJ\eZPCMzEa8P6rVAdz1oyhuJ.par2 Z:\Audio Tracks\Battlestar.Galactica.S01.English.Audio.Description.Pack\Battlestar.Galactica.S01E00.The.Miniseries.Part.2.Audio.Description.AAC2.0.mka Z:\Audio Tracks\Battlestar.Galactica.S01.English.Audio.Description.Pack\Battlestar.Galactica.S01E00.The.Miniseries.Part.1.Audio.Description.AAC2.0.mka

generates in the correct folder with the right name but the data in the files is incorrect - the folder name is appended in the par2 data.

testing par2:

C:\ngPost\ngPost_v4.15.beta1-x64>par2.exe c -B "Z:" -l -m1024 -r8 -s768000 -a C:\ngPost\temp\hqC0Cm6
FIcHepruxtVbvfS\hqC0Cm6FIcHepruxtVbvfS.par2 Z:\Audio_Tracks\Battlestar.Galactica.S01E00.The.Miniseri
es.Part.1.Audio.Description.AAC2.0.mka

result: bad PAR2

C:\ngPost\ngPost_v4.15.beta1-x64>par2.exe c -B "Z:\Audio_Tracks" -l -m1024 -r8 -s768000 -a C:\ngPost
\temp\hqC0Cm6FIcHepruxtVbvfS\hqC0Cm6FIcHepruxtVbvfS.par2 Z:\Audio_Tracks\Battlestar.Galactica.S01E00
.The.Miniseries.Part.1.Audio.Description.AAC2.0.mka

result: good PAR2

so it looks like par2 basepath needs to include the folder structure leading to the data itself?

AssRap3r avatar Mar 14 '21 21:03 AssRap3r

hum... I didn't check the par2 generated... I was lazy to open a cmd and I don't have QuickPar installed... crap :( so it means all files must be in the same folder? Is it only for par2cmdline or same with multipar? did you check the par2 generated with parpar?

ngPost-beta2: basepath removed from config (rely on exe) - no output file specified in par2 command - files are generated in the source folder (based on filename and not the random ngPost string) and not added to post queue debug output: [21:06:25.863] Generating par2: C:\ngPost\ngPost_v4.15.beta1-x64\par2.exe c -l -m1024 -r8 -s768000 Z:\Audio Tracks\Battlestar.Galactica.S01.English.Audio.Description.Pack\Battlestar.Galactica.S01E00.The.Miniseries.Part.1.Audio.Description.AAC2.0.mka

ok my bad, I've corrected that I think by adding the par2 name before the file...

generates in the correct folder with the right name but the data in the files is incorrect - the folder name is appended in the par2 data.

arf I see... same issue on Linux with -B /

well I guess par2cmdline only supports files in the same folder... it sucks. ok, I'll update the code then... what about MultiPar, can you test for me please. I'll then add an error message if files are from different folder suggesting to use ParPar instead... But anyway, I guess the use case for par2 generation only would 99% be for only one file... but I prefer to make ngPost robust now ;)

mbruel avatar Mar 14 '21 22:03 mbruel

@AssRap3r what about this version then? please try both MultiPar and par2cmdline. (no basepath in config, it is added automatically)

mbruel avatar Mar 14 '21 22:03 mbruel

@mbruel

ngPost-beta3: MultiPar

[22:41:25.087] Generating par2: C:\ngPost\ngPost_v4.15.beta1-x64\par2j64.exe create /rr8 /lc40 /lr /rd2 /d Z:/Audio_Tracks C:\ngPost\temp\BPGwGEXGd9cDYhi4rsuYFj\BPGwGEXGd9cDYhi4rsuYFj.par2 Z:\Audio_Tracks\Abominable.2019.Blu-Ray.Czech.DD5.1.640 kbps.copy.mka Z:\Audio_Tracks\Battlestar.Galactica.S01E00.The.Miniseries.Part.2.Audio.Description.AAC2.0.mka
Parchive 2.0 client version 1.3.1.5 by Yutaka Sawada

base-directory is invalid

for some reason my goofy PC is back to crashing when par2j runs but I can get a pre-creation summary running it via cmd window. changing /d dir: to /dDIR: works (remove space between the d and drive letter) maybe enclose both the drive:folder name\ in quotes along with the filenames in case they have spaces?

par2cmdline:

[22:49:37.784] Generating par2: C:\ngPost\ngPost_v4.15.beta1-x64\par2.exe c -l -m1024 -r8 -s768000 -B Z:/Audio_Tracks C:\ngPost\temp\1mdDtlbtpQvkhpKy2YT050\1mdDtlbtpQvkhpKy2YT050.par2 Z:\Audio_Tracks\Abominable.2019.Blu-Ray.Czech.DD5.1.640.kbps.copy.mka Z:\Audio_Tracks\Battlestar.Galactica.S01E00.The.Miniseries.Part.2.Audio.Description.AAC2.0.mka

generates par2 in correct folder, applies base folder in the command, par2 files contain no directory structure, so all good for this :)

did you check the par2 generated with parpar?

Not initially, but having just tested it it does have the same result - paths end up in the par2 file. they can be easily removed with -f basename in either config or hardcoded.

AssRap3r avatar Mar 14 '21 23:03 AssRap3r

@AssRap3r what about this one, it should be all good no?

  • I've added -f basename for Parpar (please try with 2 files from different path)
  • I've replaced slashes by backslashes for basename paths of MultiPar and par2cmd
  • for Multipar, the option is now: /d"c:\tmp\my file.txt"

All good?

mbruel avatar Mar 15 '21 08:03 mbruel

ok try this one plz... I've tested quickly:

  • Parpar should be fixed
  • Multipar is now /dC:\path it seems to work but doesn't support spaces in the path... I've tried to add a backslash before the space and it doesn't work either... so I'll let it like this, we'll see if someone raise an issue for Multipar in which case I'll suggest to use ParPar or to not have spaces in the path...

Please confirm if all good. Cheers

mbruel avatar Mar 15 '21 12:03 mbruel

@AssRap3r cool, I'll try to take some time this weekend to check the command line implementation, might be a bit boring, especially with the AUTO_COMPRESS parameter in the config. I might update it with different values, not sure yet... At least it should be ignored when --gen_par2 is given alone in CMD. I'll let you know when done.

I noticed there is no slice size in the default config for multipar, is that intentional?

I don't know Multipar at all, I don't use Windows since ages... Someone gave me those default arguments, please let me know what could be added or changed.

mbruel avatar Mar 16 '21 00:03 mbruel

@AssRap3r do you compile from github? Can you give it a try. Edit you conf and replace AUTO_COMPRESS with PACK cf the new conf here. So in command line you need to add --pack to use PACK from config Let me know what you think. Cheers

mbruel avatar Mar 19 '21 19:03 mbruel

if PACK contains GEN_NAME it doesn't apply, the par2 is still generated with the folder name. in the previous/beta versions GUI having AUTO_COMPRESS set to true in the config would apply the random name first and keep it when you unticked compress

Is there an interest to have the par2 name different than the folder name? what did you do? -i folder to send the content of a folder? So your folder name has the real name and the files inside could have an obfuscated one?

should the par2 generation follow the MONITOR_EXTENSIONS list? in my test folder there were mkv, nfo, png and jpg files. I'd assume that anyone using extension monitor is only interested in those filetypes.

you mean in command line using -i with folder you'd like to filter the files inside using MONITOR_EXTENSIONS? or just the files you want to generate the par2 for?

--monitor or --auto not supported, not sure if you were planning on implementing or not. more complicated than before I guess because you need to link MONITOR_EXTENSIONS and MONITOR_IGNORE_DIR and perhaps have an alternate pack command depending on the mixture?

well I didn't check what would happen with those... did you try? it fails cause there is no compression? yeah the only issue is that you can't post folders without compression (packing). Maybe I could use 2 configs: PACK_FILES and PACK_FOLDERS but it might be confusing for the user... And I guess 99% of ngPost user would use PACK = COMPRESS, GEN_NAME, GEN_PASS, GEN_PAR2

mbruel avatar Mar 20 '21 17:03 mbruel

yes, I'd argue the same reason we have gen_name for compress - to obfuscate the upload, even if only partially. although it's not as good as with compress since the real files will be posted with their original names

well I guess something nice could be to:

  • generate the par2 on the files with a random name for the par2 volumes
  • and also post each files under a random name (as the par2 could rename them) This way having only the par2 wouldn't help you to get the files

But there is a bit of work to implement it... I might let it for another time if some people are interested in it. Maybe you could open a request once the v4.15 will be out.

ngPost -x -o /downloads/temp/nzbs/"$FILECUT".nzb -i "$TMP""$FILECUT"

@AssRap3r as you use -x, you don't need to bother renaming anything. your post is already invisible ;)

this is a fork in the road, it sounds like the options are:

yep for this work, I'll let you open another requests and see if people are interested...

To release the v4.15, here is what I'm going to do:

  1. for --auto as I know the list of files, I'll allow GEN_PAR2 if no folders otherwise I'll give an error message and do nothing
  2. for --monitor I'll allow GEN_PAR2 without COMPRESS only if MONITOR_IGNORE_DIR is set in the config.

It could also be possible to post files with --auto and --monitor without either compression or par2 generation but I guess it is better to not do that as par2 should be included. What do you think?

mbruel avatar Mar 24 '21 14:03 mbruel