snakebite
snakebite copied to clipboard
copyFromLocal not implemented?
I notice copyFromLocal exists in commandlineparser.py but not in client.py. Is it not implemented yet?
Thanks!
Yes that shouldn't be there.. Put was commented out, but I forgot copyFromLocal. I'll submit a patch this week, because this is confusing.
Thanks.
So, this means that copyFromLocal/put is not implemented? Do we use 'hadoop fs -copyFromLocal' instead?
I note that in the spotify blog [http://labs.spotify.com/2013/05/07/snakebite/], it states: there are plans to also implement actions that also involve interaction with the DataNode
In addition, the documentation [http://spotify.github.io/snakebite/] has a 'To Do' section where it states: put [paths] dst copy sources from local file system to destination
What is the timeline for this 'put'/'copyFromLocal' feature?
Sorry for the late reply, but we haven't prioritized this. Would be nice to have (just like full YARN support).
+ 1
I want to use snakebite to replace a several slow steps in our deployment automation, unfortunately we use copyFromlocal a lot. So this is definitely a must have feature for a lot of people.
Thanks for the good work.
seconding sodul's comment
Thanks for an excellent and straightforward client -- just throwing in a makeshift vote for the ability to use put/copyFromLocal to speed up a few data ingress scripts.
Great work, keep it up. Would also like to see put/copyfromlocal in the future.
Still no word on this?
If communicating through protobuf makes it hard to implement features that require direct access to datanodes (such as the put
and append
operations), it would be wise to have a look at WebHDFS. Using WebHDFS in Snakebite, instead of Protobuf would make it trivial to implement copyFromLocal
/put
, and other file write operations.
I think it's a shame that such a promising project gets stuck on something that is really needed, like copyFromLocal
.
@ravwojdyla and I have been discussing this and currently there doesn't seem to be much time to implement this, so it's very hard to give any ETA on this feature. I don't think we want to add WebHDFS support, since that sort of defeats the purpose of snakebite and requires additional infrastructure.
I agree with @wouterdebie webhdfs wouldn't have the speed of snakebite. I'm working on implementing put in RPC at the moment, if anyone has any thoughts or progress they can share to accelerate it would be great to work together.
Where can I find the RPC documentation?
Has there been progress toward implementing put
? I was going to take a crack at it for a project I'm working on, and was considering contributing it upstream, but don't want to duplicate effort if someone already has a handle on this.
I'm pretty sure it has not, maybe @ravwojdyla can confirm.
I have started working on this feature some time ago - can probably upload what I have right now (it's far from complete). That said if anyone feels like working on this problem please create issues you plan to work on, and if you need help - please ping me/us. Thanks!
@ravwojdyla I'd love to help, I started to do it but the problem that ended up blocking me was that I couldn't find documentation on what RPCs I should even call to do something like an append, and the ones I tried didn't return what they claimed in the auto-generated protobuf spec... I might be able to help with this effort if you could point me to good documentation about the protocol, but I was unable to find any in sufficient detail.
The problem with Hadoop is that protocols are pretty badly documented. When I started snakebite, I spent a lot of time reading Hadoop code and tcpdumping to figure out what was going on...
is there any ETA on when will copyFromLocal/put support would be present?
+1
+1
+1 :)
in the mean time:
import subprocess
subprocess.check_call(['hdfs', 'dfs', '-put', '/path/to/src', 'path/to/dst'], shell=False]
+1
@ravwojdyla - is there a separate branch for that issue? Did you have a chance to push what you had already done? Thanks!
It looks like a go library similar to snakebite has started making progress on writing to hdfs: https://github.com/colinmarc/hdfs/pull/12
+1
An alternative that is relatively snappy is to use httpfs, it is a service that provide an http interface to hdfs. We actually ended up writing our own REST API in groovy to access hdfs and the hbase shell (which has no API).
https://hadoop.apache.org/docs/current/hadoop-hdfs-httpfs/index.html
+1
:+1:
Because it was never implemented. On Feb 17, 2016 17:54, Cristian Orellana [email protected] wrote:
—Reply to this email directly or view it on GitHub.