parquet-java icon indicating copy to clipboard operation
parquet-java copied to clipboard

PARQUET-1776: NIO wrapper for Output/Input File

Open HunterL opened this issue 5 years ago • 3 comments

Make sure you have checked all steps below.

Jira

Tests

  • [ ] My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • [ ] In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain Javadoc that explain what it does

HunterL avatar Sep 15 '20 19:09 HunterL

@HunterL

OK, so I've been recently diving into Java NIO a bit and let me add some thoughts that relate to this task:

Please change the naming convention and drop the "Local" prefix. I think Nio is probably best. As an example of why this matters:

// This exists in the current PR
return new LocalSeekableInputStream(Files.newInputStream(path));

Well that statement Files.newInputStream(path) will provide a new InputStream, but the InputStream is totally driven by whatever the path points at. Things within the JVM could be setup to allow the provided path to point at an FTP location for example and that would be a remote read and not a local read.

I also think you will have better luck storing, internal to the Input/Output streams a SeekableByteChannel instead of an OutputStream or an InputStream. This will allow you to easily track the position aspect of the underlying stream without having to do it yourself. For example, the current OutputStream implementation does not currently implement this feature:

  @Override
  public long getPos() throws IOException {
    return 0;
  }

Thanks.

belugabehr avatar Sep 22 '20 14:09 belugabehr

Also requires unit tests.

belugabehr avatar Sep 22 '20 14:09 belugabehr

Here are a couple of starters:

       private final ByteBuffer oneByteBuffer = ByteBuffer.allocate(1);

	@Override
	public int read() throws IOException {
		int read = this.channel.read(this.oneByteBuffer.rewind());
		if (read < 0) {
			return read;
		}
		return (0xFF & this.oneByteBuffer.flip().get());
	}

	@Override
	public void readFully(byte[] bytes) throws IOException {
		ByteBuffer bb = ByteBuffer.allocate(bytes.length);
					while (bb.hasRemaining()) {
						int read = this.channel.read(bb);
						if (read == -1) {
							throw new EOFException();
						}
					}
					bb.flip().get(bytes);
	}

belugabehr avatar Sep 22 '20 19:09 belugabehr