hadoop-cluster-docker icon indicating copy to clipboard operation
hadoop-cluster-docker copied to clipboard

java 如何访问docker中的hdfs?

Open Xiazki opened this issue 6 years ago • 7 comments

请问 我想通过java来操作hdfs FileSystem fs = FileSystem.get(new URI("hdfs://172.18.0.2:9000/"), configuration, "root"); System.out.println("begin copy"); fs.copyFromLocalFile(new Path("/Users/xxx/apps/test/test.log"), new Path("/")); System.out.println("done!"); 用hadoop上master上的ip 没法在hdfs上创建文件 我仿照脚本加上了一个 0.0.0.0:9000 -> 9000/tcp 冲宿主机上映射到hadoop-master上的9000端口,hdfs://localhost:9000/ 发现虽然能创建文件但size是0 请教一下,谢谢!

Xiazki avatar Aug 01 '18 10:08 Xiazki

不要动脚本,脚本里配置好了,

<?xml version="1.0"?>
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://hadoop-master:9000/</value>
    </property>
</configuration>

你只要访问 主机地址:9000就行

hsipeng avatar Oct 01 '18 02:10 hsipeng

同问,我看启动参数只暴露出8088和50070,宿主机上是如何能访问到9000的端口的呢?主机地址:9000试了下 显示连接被拒绝 Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712) at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528) at org.apache.hadoop.ipc.Client.call(Client.java:1451)

xmzDesign avatar Nov 02 '18 06:11 xmzDesign

同问,windows下,无法通过python的hdfs模块访问HDFS,即便我加了对9000端口的映射。 这是start-container.sh

#!/bin/bash

# the default node number is 3
N=${1:-3}


# start hadoop master container
docker rm -f hadoop-master &> /dev/null
echo "start hadoop-master container..."
docker run -itd \
                --net=hadoop \
                -p 50070:50070 \
                -p 8088:8088 \
                -p 9000:9000 \
                --name hadoop-master \
                --hostname hadoop-master \
                kiwenlau/hadoop:1.0 &> /dev/null


# start hadoop slave container
i=1
while [ $i -lt $N ]
do
	docker rm -f hadoop-slave$i &> /dev/null
	echo "start hadoop-slave$i container..."
	docker run -itd \
	                --net=hadoop \
	                --name hadoop-slave$i \
	                --hostname hadoop-slave$i \
	                kiwenlau/hadoop:1.0 &> /dev/null
	i=$(( $i + 1 ))
done 

# get into hadoop master container
docker exec -it hadoop-master bash

这是python测试客户端:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time    : 2018/12/6 18:56
# @Author  : Trojx
# @File    : hdfs_demo.py

from hdfs import InsecureClient
import time

if __name__ == '__main__':
    root_path = "/"
    c = InsecureClient(url="http://localhost:50070", user='root', root=root_path)
    c.makedirs('/user/root/pyhdfs')
    c.write('/user/root/pyhdfs/1.log', time.asctime(time.localtime(time.time())) + '\n', True)
    c.download('/user/root/pyhdfs/1.log', '.', True)
    c.upload('/user/root/pyhdfs/', './pyhdfs_example.py', True)
    hdfs_files = c.list('/user/root/pyhdfs', True)
    for f in hdfs_files:
        print(f)
    print(c.content('/user/root/pyhdfs/pyhdfs_example.py'))
    print(c.checksum('/user/root/pyhdfs/pyhdfs_example.py'))
    c.delete('/user/root/pyhdfs/', True)

这是报错:

D:\PycharmProjects\hadoop-cluster-docker\venv\Scripts\python.exe D:/PycharmProjects/hadoop-cluster-docker/hdfs_demo.py
Traceback (most recent call last):
  File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\urllib3\connection.py", line 159, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw)
  File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\urllib3\util\connection.py", line 57, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "C:\Program Files\Python37\lib\socket.py", line 748, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 11001] getaddrinfo failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\urllib3\connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\urllib3\connectionpool.py", line 354, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "C:\Program Files\Python37\lib\http\client.py", line 1229, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "C:\Program Files\Python37\lib\http\client.py", line 1275, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "C:\Program Files\Python37\lib\http\client.py", line 1224, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "C:\Program Files\Python37\lib\http\client.py", line 1016, in _send_output
    self.send(msg)
  File "C:\Program Files\Python37\lib\http\client.py", line 956, in send
    self.connect()
  File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\urllib3\connection.py", line 181, in connect
    conn = self._new_conn()
  File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\urllib3\connection.py", line 168, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x0000029A4D2D6198>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\requests\adapters.py", line 449, in send
    timeout=timeout
  File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\urllib3\connectionpool.py", line 638, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\urllib3\util\retry.py", line 398, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='hadoop-slave2', port=50075): Max retries exceeded with url: /webhdfs/v1/user/root/pyhdfs/1.log?op=CREATE&user.name=root&namenoderpcaddress=hadoop-master:9000&overwrite=true&user.name=root (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000029A4D2D6198>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:/PycharmProjects/hadoop-cluster-docker/hdfs_demo.py", line 20, in <module>
    c.write('/user/root/pyhdfs/1.log', time.asctime(time.localtime(time.time())) + '\n', True)
  File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\hdfs\client.py", line 476, in write
    consumer(data)
  File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\hdfs\client.py", line 468, in consumer
    data=(c.encode(encoding) for c in _data) if encoding else _data,
  File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\hdfs\client.py", line 214, in _request
    **kwargs
  File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\requests\sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\requests\sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\requests\adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='hadoop-slave2', port=50075): Max retries exceeded with url: /webhdfs/v1/user/root/pyhdfs/1.log?op=CREATE&user.name=root&namenoderpcaddress=hadoop-master:9000&overwrite=true&user.name=root (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000029A4D2D6198>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))

Process finished with exit code 1

JianxunRao avatar Dec 06 '18 11:12 JianxunRao

同楼上一样的问题

wushuaiping avatar Jan 12 '19 05:01 wushuaiping

请问这个项目是只能测试wordcount还是能进一步开发,用hadoop完成一些别的工作?

byrChen avatar Apr 05 '19 09:04 byrChen

https://blog.csdn.net/sunrising_hill/article/details/53559398 按照这个修改过后呢?

acse-hy23 avatar Jan 03 '20 17:01 acse-hy23

https://blog.csdn.net/sunrising_hill/article/details/53559398 按照这个修改过后呢? 没用

chankamlam avatar Oct 24 '20 09:10 chankamlam