go-streams icon indicating copy to clipboard operation
go-streams copied to clipboard

LazyIO - network and local file reader/writer compatible with go-streams.

Open BigB84 opened this issue 2 years ago • 2 comments

Hi!

With Java 8+ it's possible to read file using streams instead of using loop. This is so cool, as file isn't read entirely to memory.

Now I'm facing this issue with go. I need to read hundreds of text files with millions of records and process them.

Thus I need reader and writer that will be compatible with this library. I tried to do so but it's hard considering implementing generic interface IIterator

The strength of java in this case is the fact that streams became standard.

Do you think it's possible to include such io.reader or bufio.scanner to this library?

BigB84 avatar May 16 '23 17:05 BigB84

Hi @BigB84 , we can definitely think about a solution for this case. How are you reading the files and how do you intend to iterate over them?

  • Is it a bunch of files and you want to iterate over each of them and the iteration would be the contents of each file?
  • Are you thinking about loading a file and iterating over the contents of that single file line by line or byte by byte?

jucardi avatar May 23 '23 00:05 jucardi

Thanks for reply!

I think it's second case you mentioned.

Actually It's not secret so I can share the actual problem.

I maintain DNS with domain blocklist. Blocklist is built using hostlists. Hostlists are just text files with domains written line-by-iline but there are plenty of formats. They may be obtained locally from disk or from network by https

Consider just a few formats of hostlists written that way:

127.0.0.1 example.com
127.0.0.1 subdomain.of.example.com
127.0.0.1 another.subdomain.of.example.com
127.0.0.1 something.1.example.com
...

0.0.0.0 example.com
0.0.0.0 subdomain.of.example.com
0.0.0.0 another.subdomain.of.example.com
0.0.0.0 something.2.example.com
...
example.com
subdomain.of.example.com
another.subdomain.of.example.com
something.3.example.com
...

Each one I need to process that way they are cleaned from unwanted expressions (0.0.0.0, 127.0.0.1 etc.).

All set of rules is more complicated, so that's why I use lazy streams for efficient mapping and filtering.

In this case I need reader of type string. The problem is how to implement such generic reader?

Shall we write two readers for instance: LazyReader of T interface that implements IIterator and LazyLinewiseReader (with fixed T = string) for this specific case?

In the future someone may need to implement reading integers byte by byte so he/she would Implement such reader as LazyIntegerByteReader.

What do you think?

BigB84 avatar May 28 '23 15:05 BigB84