hadoop-cluster-docker
hadoop-cluster-docker copied to clipboard
java 如何访问docker中的hdfs?
请问 我想通过java来操作hdfs
FileSystem fs = FileSystem.get(new URI("hdfs://172.18.0.2:9000/"), configuration, "root"); System.out.println("begin copy"); fs.copyFromLocalFile(new Path("/Users/xxx/apps/test/test.log"), new Path("/")); System.out.println("done!");
用hadoop上master上的ip 没法在hdfs上创建文件
我仿照脚本加上了一个 0.0.0.0:9000 -> 9000/tcp 冲宿主机上映射到hadoop-master上的9000端口,hdfs://localhost:9000/ 发现虽然能创建文件但size是0
请教一下,谢谢!
不要动脚本,脚本里配置好了,
<?xml version="1.0"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop-master:9000/</value>
</property>
</configuration>
你只要访问 主机地址:9000就行
同问,我看启动参数只暴露出8088和50070,宿主机上是如何能访问到9000的端口的呢?主机地址:9000试了下 显示连接被拒绝 Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712) at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528) at org.apache.hadoop.ipc.Client.call(Client.java:1451)
同问,windows下,无法通过python的hdfs模块访问HDFS,即便我加了对9000端口的映射。
这是start-container.sh
#!/bin/bash
# the default node number is 3
N=${1:-3}
# start hadoop master container
docker rm -f hadoop-master &> /dev/null
echo "start hadoop-master container..."
docker run -itd \
--net=hadoop \
-p 50070:50070 \
-p 8088:8088 \
-p 9000:9000 \
--name hadoop-master \
--hostname hadoop-master \
kiwenlau/hadoop:1.0 &> /dev/null
# start hadoop slave container
i=1
while [ $i -lt $N ]
do
docker rm -f hadoop-slave$i &> /dev/null
echo "start hadoop-slave$i container..."
docker run -itd \
--net=hadoop \
--name hadoop-slave$i \
--hostname hadoop-slave$i \
kiwenlau/hadoop:1.0 &> /dev/null
i=$(( $i + 1 ))
done
# get into hadoop master container
docker exec -it hadoop-master bash
这是python测试客户端:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time : 2018/12/6 18:56
# @Author : Trojx
# @File : hdfs_demo.py
from hdfs import InsecureClient
import time
if __name__ == '__main__':
root_path = "/"
c = InsecureClient(url="http://localhost:50070", user='root', root=root_path)
c.makedirs('/user/root/pyhdfs')
c.write('/user/root/pyhdfs/1.log', time.asctime(time.localtime(time.time())) + '\n', True)
c.download('/user/root/pyhdfs/1.log', '.', True)
c.upload('/user/root/pyhdfs/', './pyhdfs_example.py', True)
hdfs_files = c.list('/user/root/pyhdfs', True)
for f in hdfs_files:
print(f)
print(c.content('/user/root/pyhdfs/pyhdfs_example.py'))
print(c.checksum('/user/root/pyhdfs/pyhdfs_example.py'))
c.delete('/user/root/pyhdfs/', True)
这是报错:
D:\PycharmProjects\hadoop-cluster-docker\venv\Scripts\python.exe D:/PycharmProjects/hadoop-cluster-docker/hdfs_demo.py
Traceback (most recent call last):
File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\urllib3\connection.py", line 159, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw)
File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\urllib3\util\connection.py", line 57, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "C:\Program Files\Python37\lib\socket.py", line 748, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 11001] getaddrinfo failed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\urllib3\connectionpool.py", line 600, in urlopen
chunked=chunked)
File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\urllib3\connectionpool.py", line 354, in _make_request
conn.request(method, url, **httplib_request_kw)
File "C:\Program Files\Python37\lib\http\client.py", line 1229, in request
self._send_request(method, url, body, headers, encode_chunked)
File "C:\Program Files\Python37\lib\http\client.py", line 1275, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "C:\Program Files\Python37\lib\http\client.py", line 1224, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "C:\Program Files\Python37\lib\http\client.py", line 1016, in _send_output
self.send(msg)
File "C:\Program Files\Python37\lib\http\client.py", line 956, in send
self.connect()
File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\urllib3\connection.py", line 181, in connect
conn = self._new_conn()
File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\urllib3\connection.py", line 168, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x0000029A4D2D6198>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\requests\adapters.py", line 449, in send
timeout=timeout
File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\urllib3\connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\urllib3\util\retry.py", line 398, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='hadoop-slave2', port=50075): Max retries exceeded with url: /webhdfs/v1/user/root/pyhdfs/1.log?op=CREATE&user.name=root&namenoderpcaddress=hadoop-master:9000&overwrite=true&user.name=root (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000029A4D2D6198>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:/PycharmProjects/hadoop-cluster-docker/hdfs_demo.py", line 20, in <module>
c.write('/user/root/pyhdfs/1.log', time.asctime(time.localtime(time.time())) + '\n', True)
File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\hdfs\client.py", line 476, in write
consumer(data)
File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\hdfs\client.py", line 468, in consumer
data=(c.encode(encoding) for c in _data) if encoding else _data,
File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\hdfs\client.py", line 214, in _request
**kwargs
File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\requests\sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\requests\sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "D:\PycharmProjects\hadoop-cluster-docker\venv\lib\site-packages\requests\adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='hadoop-slave2', port=50075): Max retries exceeded with url: /webhdfs/v1/user/root/pyhdfs/1.log?op=CREATE&user.name=root&namenoderpcaddress=hadoop-master:9000&overwrite=true&user.name=root (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000029A4D2D6198>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))
Process finished with exit code 1
同楼上一样的问题
请问这个项目是只能测试wordcount还是能进一步开发,用hadoop完成一些别的工作?
https://blog.csdn.net/sunrising_hill/article/details/53559398 按照这个修改过后呢?
https://blog.csdn.net/sunrising_hill/article/details/53559398 按照这个修改过后呢? 没用