ozone
ozone copied to clipboard
HDDS-7329. Extend ozone admin datanode usageinfo and list info to accept hostname parameter
What changes were proposed in this pull request?
In order to make the CLI and ozone admin datanode
subcommands, more consistent, added the hostname as a parameter to the commands that didn't accept it. These commands were usageinfo
and list info
.
There is an hdfs flag dfs.datanode.use.datanode.hostname
checked by the SCMNodeManager during start up and if it's true then we can get the datanode info by providing the hostname with the --ip pamameter, while ip will not be available anymore. The default case is keeping the flag's value to false where the ip is available but not the hostname.
If we add the --hostname parameter and keep the flag, then it's confusing for the user. In such a case we could get the usage info by specifying the hostname in both cases:
$ ozone admin datanode usageinfo --ip=<hostname>
$ ozone admin datanode usageinfo --hostname=<hostname>
We deprecated the flag and in every case that it is checked, kept the default usage, which is having value false. This way, --ip will always require the ip and if the user needs to get the info with the hostname, he can use the new --hostname parameter.
Currrently, dfs.datanode.use.datanode.hostname
is only checked in RatisHelper.java
. Should we remove it from there as well? It is left there, because it is accessed from the external jar and not DFSConfigKeysLegacy
.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-7329
How was this patch tested?
Modified the existing tests and they are all passing. Tested manually in a docker cluster.
@sokui @Xushaohong please review
Hi @xBis7 , we should not deprecate dfs.datanode.use.datanode.hostname and always make it as false. This flag is also used by the users who deploy ozone to ip changing environment such as k8s. In those environments, the ip of the ozone components can be changed, and thus we should use the hostname for communication.
This PR is to make the datanodes to be adaptive to k8s environments by letting it use hostname. Please check it out. Thanks. https://github.com/apache/ozone/pull/3186
Hi @sokui, I wasn't aware of this. I was told that this flag is not used by the community and that we can deprecate it. I looked at #3186 and you are right we need to keep the existing code as it is.
The goal here is to have a consistent CLI, so we keep the code as it is but we won't expose that behavior to the user. We need the --ip to accept only an IP address and add the --hostname parameter for accepting a hostname. We could create two new maps, one for mapping IP to UUID and one for mapping hostname to UUID and we could use those maps for accessing datanode info from the CLI. @adoroszlai @sokui What do you think about this approach?
The goal here is to have a consistent CLI, so we keep the code as it is but we won't expose that behavior to the user. We need the --ip to accept only an IP address and add the --hostname parameter for accepting a hostname. We could create two new maps, one for mapping IP to UUID and one for mapping hostname to UUID and we could use those maps for accessing datanode info from the CLI. @adoroszlai @sokui What do you think about this approach?
This sounds feasible, but some worry occurs that how to make sure the IP - UUID - hostname mapping is always consistent. If the hostname changes at some DN, we may need the retry logic with the IP-UUID mapping and need to update the corresponding hostname.
If the hostname changes at some DN, we may need the retry logic with the IP-UUID mapping and need to update the corresponding hostname.
@Xushaohong Can you provide an example in the code of how we are currently handling such a case?
Moved everything back to how it was and added these two methods used only by the CLI. This achieves the wanted behavior and I think it doesn't affect the system in the cases mentioned above. Let me know what you all think.
I think we need to revisit this PR. Ideally, the listing of nodes should work the same for IP and hostname and should not require opening up specific APIs in SCM protocol.
@kerneltime Let me explain in more details the purpose of this PR. The issue is that you can either have ip
or hostname
and use only one of them to get the datanode info.
from master the default behavior
bash-4.2$ ozone admin datanode usageinfo --ip=172.21.0.5
Usage Information (1 Datanodes)
UUID : 08c48611-7121-46a5-a047-129f573c78a6
IP Address : 172.21.0.5 (ozone_datanode_1.ozone_default)
Capacity : 182775984128 B (170.22 GB)
Total Used : 168680075264 B (157.10 GB)
Total Used % : 92.29%
Ozone Used : 4096 B (4 KB)
Ozone Used % : 0.00%
Remaining : 14095908864 B (13.13 GB)
Remaining % : 7.71%
bash-4.2$ ozone admin datanode usageinfo --ip=ozone_datanode_1.ozone_default
Usage Information (0 Datanodes)
from master with dfs.datanode.use.datanode.hostname=true
bash-4.2$ ozone admin datanode usageinfo --ip=172.19.0.5
Usage Information (0 Datanodes)
bash-4.2$ ozone admin datanode usageinfo --ip=d416d3fec041
Usage Information (1 Datanodes)
UUID : 809c1065-2594-4208-bc57-4442b71022e2
IP Address : 172.19.0.5 (d416d3fec041)
Capacity : 182775984128 B (170.22 GB)
Total Used : 168679792640 B (157.10 GB)
Total Used % : 92.29%
Ozone Used : 4096 B (4 KB)
Ozone Used % : 0.00%
Remaining : 14096191488 B (13.13 GB)
Remaining % : 7.71%
We want to have a --hostname
parameter and also we want the CLI to be more consistent, meaning that the --ip
accepts only ip and the new --hostname
accepts only hostname. If we just add the --hostname
parameter then in the above second case we will have --hostname
that accepts hostname and --ip
that also accepts only hostname.
As pointed out to me, we can't deprecate the flag or change the current logic in SCMNodeManager
because it's necessary for environments like Kubernetes where the datanode can restart and end up with a new Ip address. That's why I created specific APIs for the CLI.
The current code achieves the desired behavior.
from this branch with dfs.datanode.use.datanode.hostname=true
bash-4.2$ ozone admin datanode usageinfo --ip=172.22.0.5
Usage Information (1 Datanodes)
UUID : 961511cf-38ec-412c-b805-381549ce03cf
IP Address : 172.22.0.5
Hostname : 16cbb539ec92
Capacity : 182775984128 B (170.22 GB)
Total Used : 168680542208 B (157.10 GB)
Total Used % : 92.29%
Ozone Used : 4096 B (4 KB)
Ozone Used % : 0.00%
Remaining : 14095441920 B (13.13 GB)
Remaining % : 7.71%
bash-4.2$ ozone admin datanode usageinfo --ip=16cbb539ec92
Usage Information (0 Datanodes)
bash-4.2$ ozone admin datanode usageinfo --hostname=16cbb539ec92
Usage Information (1 Datanodes)
UUID : 961511cf-38ec-412c-b805-381549ce03cf
IP Address : 172.22.0.5
Hostname : 16cbb539ec92
Capacity : 182775984128 B (170.22 GB)
Total Used : 169011408896 B (157.40 GB)
Total Used % : 92.47%
Ozone Used : 4096 B (4 KB)
Ozone Used % : 0.00%
Remaining : 13764575232 B (12.82 GB)
Remaining % : 7.53%
Let's discuss this in the next community sync @neils-dev
@sokui @Xushaohong If this Map holds both IPs and Hostnames at all times, will this be an issue? In case we are in an ip changing environment and the ip doesn't work, hostname will still be available for the user. What do you think? Also, if you could provide a way for me to test it.
If this Map holds both IPs and Hostnames at all times, will this be an issue? In case we are in an ip changing environment and the ip doesn't work, hostname will still be available for the user. What do you think? Also, if you could provide a way for me to test it.
Currently, this map is fine with IP changing environment since we either use IP or hostname, not both of them simultaneously. The prerequisite is the config dfs.datanode.use.datanode.hostname
.
If the map holds both IPs and Hostnames at all times, as it maps hostname / IP to the DNs, there will be the same UUID occurs in different keys' value sets causing a sequence of unknown errors. This part of the structure needs to be designed if you want to do so.
The simple test environment could be found in the kubernetes
directory.
E.g ozone/hadoop-ozone/dist/target/ozone-1.3.0-SNAPSHOT/kubernetes
There is a README.md to teach how to set up an ozone k8s cluster.
Kill/ Restart one DN pod could change the IP.
Thanks @sokui for your comments. On the errors you mention that may happen when using the map for both ip and hostnames mapping to UUIDs of DNs,
If the map holds both IPs and Hostnames at all times, as it maps hostname / IP to the DNs, there will be the same UUID occurs in different keys' value sets causing a sequence of unknown errors.
Can you give some detail on this? Where this can cause problems?
Thanks @sokui for your comments. On the errors you mention that may happen when using the map for both ip and hostnames mapping to UUIDs of DNs,
If the map holds both IPs and Hostnames at all times, as it maps hostname / IP to the DNs, there will be the same UUID occurs in different keys' value sets causing a sequence of unknown errors.
Can you give some detail on this? Where this can cause problems?
@neils-dev , I did not mean your solution will have problems in this case. I just try to reminder you to be careful of this case. When the variable/method named as hostname, people may assume it is just the hostname not IP, but in some case it is not true. So far in this PR, I do not think this is an issue.
@Xushaohong
Kill/ Restart one DN pod could change the IP.
This works fine with the current approach. Any particular scenarios or test cases? Commands to execute? I'm referring to anything more complex than getting the datanode info.
This looks much better, will complete my review this week.
Kill/ Restart one DN pod could change the IP.
This works fine with the current approach. Any particular scenarios or test cases? Commands to execute? I'm referring to anything more complex than getting the datanode info.
Maybe you could try on the metal environment and adjust the hostname manually which is the case for hostname change. If the patch is able to handle the IP-changing and hostname-changing cases, that would be great~
Kill/ Restart one DN pod could change the IP.
This works fine with the current approach. Any particular scenarios or test cases? Commands to execute? I'm referring to anything more complex than getting the datanode info.
Maybe you could try on the metal environment and adjust the hostname manually which is the case for hostname change. If the patch is able to handle the IP-changing and hostname-changing cases, that would be great~
For k8s, if you deploy datanodes as deployment instead of statefulset, when you delete a node, it will restart with a different hostname, I think.
Please add a end to end robot test which should help close the loop for the feature.
Please add a end to end robot test which should help close the loop for the feature.
@kerneltime I added the robot test you requested.
@kerneltime After some testing, it seems that making the use of hostname the default option breaks the system except for a kubernetes environment due to pipeline allocation to datanodes. To make this work we would have to make a lot of changes and this PR would grow out of hand unnecessarily. Went back to using the flag for hostnames and now everything works and we still have the desired outcome, that the ip and hostname are both available at the client. If you could take another look and let me know what you think.
@xBis7 I plan to finish this, are you OK with that?
@adoroszlai Please, feel free to take this over.
As far as I can recall, these were the unresolved issues that blocked this ticket
- We need to use hostnames with Kubernetes because it keeps changing Node IPs dynamically
- It would be nice to migrate Ozone entirely to using hostnames
- Ozone is too dependent on IPs for topology and therefore moving entirely to hostnames is impossible for now
- Based on this flag
dfs.datanode.use.datanode.hostname
we can either have IPs available or hostnames.SCMNodeManager
needs refactoring to have both of them available
As far as I can recall, these were the unresolved issues that blocked this ticket
Thanks @xBis7, that's very useful for the future. However, here I don't intend to address all hostname-related problems, just trying to wrap it up considering your latest comment:
Went back to using the flag for hostnames and now everything works and we still have the desired outcome, that the ip and hostname are both available at the client.
@xBis7 please let me know if you are OK with the latest update to be merged
@adoroszlai Thanks for finishing this. I've gone through the changes. LGTM!
Thanks @xBis7 for the patch, @kerneltime, @sadanand48, @sokui, @Xushaohong for the review.