hdfs
hdfs copied to clipboard
Add an option for determine read/write timeout for namenodes
In most cases for the datanodes
, the user can set read/write timeout by using SetDeadline
to the Reader
s. But for namenodes
, we can only use a single DialContext
when initializing the namenode connection.
The issue that I faced was, I was able to connect to the namenode without any timeout issues, also works fine for write
function. But, we had an issue that namenode can't respond to the request. At this point, the client hangs.
ref: https://github.com/colinmarc/hdfs/blob/master/internal/rpc/namenode.go#L209
I thought It would be great if we add options for determining read/write timeout value for the namenodes to prevent hanging when the namenode isn't available for responding.
Please let me know if I misunderstood the concept :)
Hi @kimtree, thanks for the suggestion.
I think this may be the simplest way to solve the problem, but I've also been considering adding context
support for some methods, eg RenameContext(ctx context.Context, oldpath, newpath string)
, with support for cancellation. Would that work in your case?
Hi @colinmarc, It's good to hear that you've been considering adding context
support! I also made context
support by wrapping the library function in our project like below.
func open(ctx context.Context, client hdfs.Client, path string) (*FileReader, error) {
reader, err := client.Open(path)
if err != nil {
return nil, err
}
if deadline, ok := ctx.Deadline(); ok {
if err := reader.SetDeadline(deadline); err != nil {
return nil, fmt.Errorf("failed to set deadline: %w", err)
}
}
return reader, nil
}
This way of using context
is not complete since client.Open
function is out of control. Thus, our problem occurs when executing c.namenode.Execute
from client.Open
function. Are you going to add context
support when calling c.namenode.Execute
function? If so, It would solve our problem!
Exactly. I would change the signature of Execute
to take a context.
Yes, That's a good idea. So when do you plan to add this feature?
Currently my thinking would be to add RenameContext
etc for a point release, and then change the old Rename
etc for a 3.0 release. I'm not working on context support right now, and therefore can't give you an estimate on when it may land. If you'd like to take a stab at it, that would be more than welcome!
This way of using
context
is not complete sinceclient.Open
function is out of control. Thus, our problem occurs when executingc.namenode.Execute
fromclient.Open
function. Are you going to addcontext
support when callingc.namenode.Execute
function? If so, It would solve our problem!
That said, this example illustrates where it would be tricky. The call to Execute
(for getBlocks
) actually happens lazily, in Read
. We could move that up, but that might introduce subtle changes in behaviour.
Yeah, that's true. The reason why I specified read/write
timeout especially for the datanode
is to handle failure from the namenode, not lazily loaded Read
or the action afterward. To the users, It might be confused since the timeout
occurs both namenode
and datanode
and couldn't understand the concept of how timeout works at a glance. (Timeout includes actions both namenode
and datanode
or the only actions for each namenode
and datanode
)
I'm not sure my approach is still acceptable in the meantime waiting for v3, Is it possible to accept this PR and release a minor version for v2?
The reason why I specified read/write timeout especially for the datanode is to handle failure from the namenode, not lazily loaded Read or the action afterward.
Hm, maybe Open
is actually one of the functions that doesn't need a context parameter. We could just use the read deadline for the getBlocks
. Or is that too confusing?
My problem wasn't from the FileReader
since we can set deadline when reading the data later. In my case, our namenode
failures when executing info, err := c.getFileInfo(name)
from Open
function and that causes hang while reading the response from the namenode. So, It'll also block the later operations like creating FileReader
. That's why I add an option for read/write timeout for namenode
specifically.
Exactly. I would change the signature of
Execute
to take a context.
really excited about that.