larskraemer comments

Results 8 comments of


                                            larskraemer

Chunk file reading for large files

In order to tackle this, we might need to rethink how we find duplicates. Currently, we need to store all of the unique URLs, since `Url` stores the whole string....

Chunk file reading for large files

#17 reduces our maximum memory usage to 16 bytes per unique URL plus some constant amount. I don't think we can do much better, except in the _constant amount_ part.

Chunk file reading for large files

That wouldn't help after #17 is merged, I don't think, since at the end, we need to keep all unique URLs in memory at once (or an identifier based on...

Chunk file reading for large files

@marcelo321 `git clone https://github.com/larskraemer/urldedupe.git` `cd urldedupe` `git checkout store_hashes` Then build as usual Been a while since I looked at the code, but that version shouldn't have the memory issue,...

Move to hashing instead of generating URL keys

I updated this with a better input method which requires fewer allocations. @ameenmaali

Move to hashing instead of generating URL keys

@ameenmaali I noticed :D As for not using third party libraries; As mentioned above, we probably want a 128 bit hash. I don't think the standard library provides one, so...

Account for port numbers in URLs

#17 Also solves this. I think the only place a ':' can occur in a hostname is before the port, so discarding everything after a ':' should work for this.

make are not working

What OS and compiler are you running? With Versions, ideally. Seems like you don't have `` yet. Just to see, try replacing `#include ` with `#include ` and see if...