larskraemer
larskraemer
In order to tackle this, we might need to rethink how we find duplicates. Currently, we need to store all of the unique URLs, since `Url` stores the whole string....
#17 reduces our maximum memory usage to 16 bytes per unique URL plus some constant amount. I don't think we can do much better, except in the _constant amount_ part.
That wouldn't help after #17 is merged, I don't think, since at the end, we need to keep all unique URLs in memory at once (or an identifier based on...
@marcelo321 `git clone https://github.com/larskraemer/urldedupe.git` `cd urldedupe` `git checkout store_hashes` Then build as usual Been a while since I looked at the code, but that version shouldn't have the memory issue,...
I updated this with a better input method which requires fewer allocations. @ameenmaali
@ameenmaali I noticed :D As for not using third party libraries; As mentioned above, we probably want a 128 bit hash. I don't think the standard library provides one, so...
#17 Also solves this. I think the only place a ':' can occur in a hostname is before the port, so discarding everything after a ':' should work for this.
What OS and compiler are you running? With Versions, ideally. Seems like you don't have `` yet. Just to see, try replacing `#include ` with `#include ` and see if...