snakebite copyFromLocal not implemented?

copyFromLocal not implemented?

Open interskh opened this issue 10 years ago • 50 comments

I notice copyFromLocal exists in commandlineparser.py but not in client.py. Is it not implemented yet?

Thanks!

Dec 04 '13 00:12 interskh

Yes that shouldn't be there.. Put was commented out, but I forgot copyFromLocal. I'll submit a patch this week, because this is confusing.

Dec 04 '13 22:12 wouterdebie

Thanks.

Dec 05 '13 07:12 interskh

So, this means that copyFromLocal/put is not implemented? Do we use 'hadoop fs -copyFromLocal' instead?

I note that in the spotify blog [http://labs.spotify.com/2013/05/07/snakebite/], it states: there are plans to also implement actions that also involve interaction with the DataNode

In addition, the documentation [http://spotify.github.io/snakebite/] has a 'To Do' section where it states: put [paths] dst copy sources from local file system to destination

What is the timeline for this 'put'/'copyFromLocal' feature?

Dec 18 '13 18:12 BlondAngel

Sorry for the late reply, but we haven't prioritized this. Would be nice to have (just like full YARN support).

Mar 04 '14 11:03 wouterdebie

+ 1 I want to use snakebite to replace a several slow steps in our deployment automation, unfortunately we use copyFromlocal a lot. So this is definitely a must have feature for a lot of people.

Thanks for the good work.

Jun 05 '14 18:06 sodul

seconding sodul's comment

Sep 17 '14 08:09 carolinux

Thanks for an excellent and straightforward client -- just throwing in a makeshift vote for the ability to use put/copyFromLocal to speed up a few data ingress scripts.

Sep 29 '14 04:09 briancline

Great work, keep it up. Would also like to see put/copyfromlocal in the future.

Dec 13 '14 14:12 ptrxyz

Still no word on this? If communicating through protobuf makes it hard to implement features that require direct access to datanodes (such as the put and append operations), it would be wise to have a look at WebHDFS. Using WebHDFS in Snakebite, instead of Protobuf would make it trivial to implement copyFromLocal/put, and other file write operations.

I think it's a shame that such a promising project gets stuck on something that is really needed, like copyFromLocal.

Jan 31 '15 11:01 DonDebonair

@ravwojdyla and I have been discussing this and currently there doesn't seem to be much time to implement this, so it's very hard to give any ETA on this feature. I don't think we want to add WebHDFS support, since that sort of defeats the purpose of snakebite and requires additional infrastructure.

Jan 31 '15 11:01 wouterdebie

I agree with @wouterdebie webhdfs wouldn't have the speed of snakebite. I'm working on implementing put in RPC at the moment, if anyone has any thoughts or progress they can share to accelerate it would be great to work together.

Jan 31 '15 11:01 simonellistonball

Where can I find the RPC documentation?

Jan 31 '15 11:01 DonDebonair

Has there been progress toward implementing put? I was going to take a crack at it for a project I'm working on, and was considering contributing it upstream, but don't want to duplicate effort if someone already has a handle on this.

Mar 04 '15 23:03 zachmullen

I'm pretty sure it has not, maybe @ravwojdyla can confirm.

Mar 05 '15 21:03 Tarrasch

I have started working on this feature some time ago - can probably upload what I have right now (it's far from complete). That said if anyone feels like working on this problem please create issues you plan to work on, and if you need help - please ping me/us. Thanks!

Mar 09 '15 12:03 ravwojdyla

@ravwojdyla I'd love to help, I started to do it but the problem that ended up blocking me was that I couldn't find documentation on what RPCs I should even call to do something like an append, and the ones I tried didn't return what they claimed in the auto-generated protobuf spec... I might be able to help with this effort if you could point me to good documentation about the protocol, but I was unable to find any in sufficient detail.

Mar 09 '15 18:03 zachmullen

The problem with Hadoop is that protocols are pretty badly documented. When I started snakebite, I spent a lot of time reading Hadoop code and tcpdumping to figure out what was going on...

Mar 09 '15 18:03 wouterdebie

is there any ETA on when will copyFromLocal/put support would be present?

May 01 '15 10:05 aman572

Aug 06 '15 11:08 tothandor

Aug 31 '15 22:08 ligao101

+1 :)

Sep 18 '15 09:09 mbultrow

in the mean time:

import subprocess

subprocess.check_call(['hdfs', 'dfs', '-put', '/path/to/src', 'path/to/dst'], shell=False]

Oct 09 '15 22:10 ctimmins

Oct 14 '15 12:10 jtaryma

@ravwojdyla - is there a separate branch for that issue? Did you have a chance to push what you had already done? Thanks!

Oct 28 '15 12:10 jwszolek

It looks like a go library similar to snakebite has started making progress on writing to hdfs: https://github.com/colinmarc/hdfs/pull/12

Nov 07 '15 12:11 aeroevan

Dec 22 '15 08:12 Condla

An alternative that is relatively snappy is to use httpfs, it is a service that provide an http interface to hdfs. We actually ended up writing our own REST API in groovy to access hdfs and the hbase shell (which has no API).

https://hadoop.apache.org/docs/current/hadoop-hdfs-httpfs/index.html

Dec 22 '15 08:12 sodul

Jan 07 '16 12:01 tworec

:+1:

Feb 17 '16 22:02 crorella

Because it was never implemented. On Feb 17, 2016 17:54, Cristian Orellana [email protected] wrote:

—Reply to this email directly or view it on GitHub.

Feb 17 '16 23:02 wouterdebie

snakebite snakebite copied to clipboard

copyFromLocal not implemented?

snakebite
snakebite copied to clipboard