ozone icon indicating copy to clipboard operation
ozone copied to clipboard

HDDS-7329. Extend ozone admin datanode usageinfo and list info to accept hostname parameter

Open xBis7 opened this issue 2 years ago • 8 comments

What changes were proposed in this pull request?

In order to make the CLI and ozone admin datanode subcommands, more consistent, added the hostname as a parameter to the commands that didn't accept it. These commands were usageinfo and list info.

There is an hdfs flag dfs.datanode.use.datanode.hostname checked by the SCMNodeManager during start up and if it's true then we can get the datanode info by providing the hostname with the --ip pamameter, while ip will not be available anymore. The default case is keeping the flag's value to false where the ip is available but not the hostname.

If we add the --hostname parameter and keep the flag, then it's confusing for the user. In such a case we could get the usage info by specifying the hostname in both cases:

$ ozone admin datanode usageinfo --ip=<hostname>
$ ozone admin datanode usageinfo --hostname=<hostname>

We deprecated the flag and in every case that it is checked, kept the default usage, which is having value false. This way, --ip will always require the ip and if the user needs to get the info with the hostname, he can use the new --hostname parameter.

Currrently, dfs.datanode.use.datanode.hostname is only checked in RatisHelper.java. Should we remove it from there as well? It is left there, because it is accessed from the external jar and not DFSConfigKeysLegacy.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7329

How was this patch tested?

Modified the existing tests and they are all passing. Tested manually in a docker cluster.

xBis7 avatar Oct 14 '22 11:10 xBis7

@sokui @Xushaohong please review

adoroszlai avatar Oct 15 '22 19:10 adoroszlai

Hi @xBis7 , we should not deprecate dfs.datanode.use.datanode.hostname and always make it as false. This flag is also used by the users who deploy ozone to ip changing environment such as k8s. In those environments, the ip of the ozone components can be changed, and thus we should use the hostname for communication.

This PR is to make the datanodes to be adaptive to k8s environments by letting it use hostname. Please check it out. Thanks. https://github.com/apache/ozone/pull/3186

sokui avatar Oct 15 '22 20:10 sokui

Hi @sokui, I wasn't aware of this. I was told that this flag is not used by the community and that we can deprecate it. I looked at #3186 and you are right we need to keep the existing code as it is.

The goal here is to have a consistent CLI, so we keep the code as it is but we won't expose that behavior to the user. We need the --ip to accept only an IP address and add the --hostname parameter for accepting a hostname. We could create two new maps, one for mapping IP to UUID and one for mapping hostname to UUID and we could use those maps for accessing datanode info from the CLI. @adoroszlai @sokui What do you think about this approach?

xBis7 avatar Oct 16 '22 21:10 xBis7

The goal here is to have a consistent CLI, so we keep the code as it is but we won't expose that behavior to the user. We need the --ip to accept only an IP address and add the --hostname parameter for accepting a hostname. We could create two new maps, one for mapping IP to UUID and one for mapping hostname to UUID and we could use those maps for accessing datanode info from the CLI. @adoroszlai @sokui What do you think about this approach?

This sounds feasible, but some worry occurs that how to make sure the IP - UUID - hostname mapping is always consistent. If the hostname changes at some DN, we may need the retry logic with the IP-UUID mapping and need to update the corresponding hostname.

Xushaohong avatar Oct 17 '22 12:10 Xushaohong

If the hostname changes at some DN, we may need the retry logic with the IP-UUID mapping and need to update the corresponding hostname.

@Xushaohong Can you provide an example in the code of how we are currently handling such a case?

xBis7 avatar Oct 17 '22 13:10 xBis7

Moved everything back to how it was and added these two methods used only by the CLI. This achieves the wanted behavior and I think it doesn't affect the system in the cases mentioned above. Let me know what you all think.

xBis7 avatar Oct 17 '22 17:10 xBis7

I think we need to revisit this PR. Ideally, the listing of nodes should work the same for IP and hostname and should not require opening up specific APIs in SCM protocol.

kerneltime avatar Oct 17 '22 23:10 kerneltime

@kerneltime Let me explain in more details the purpose of this PR. The issue is that you can either have ip or hostname and use only one of them to get the datanode info.

from master the default behavior

bash-4.2$ ozone admin datanode usageinfo --ip=172.21.0.5
Usage Information (1 Datanodes)

UUID         : 08c48611-7121-46a5-a047-129f573c78a6 
IP Address   : 172.21.0.5 (ozone_datanode_1.ozone_default) 
Capacity     : 182775984128 B (170.22 GB) 
Total Used   : 168680075264 B (157.10 GB) 
Total Used % : 92.29% 
Ozone Used   : 4096 B (4 KB) 
Ozone Used % : 0.00% 
Remaining    : 14095908864 B (13.13 GB) 
Remaining %  : 7.71% 

bash-4.2$ ozone admin datanode usageinfo --ip=ozone_datanode_1.ozone_default
Usage Information (0 Datanodes)

from master with dfs.datanode.use.datanode.hostname=true

bash-4.2$ ozone admin datanode usageinfo --ip=172.19.0.5
Usage Information (0 Datanodes)

bash-4.2$ ozone admin datanode usageinfo --ip=d416d3fec041
Usage Information (1 Datanodes)

UUID         : 809c1065-2594-4208-bc57-4442b71022e2 
IP Address   : 172.19.0.5 (d416d3fec041) 
Capacity     : 182775984128 B (170.22 GB) 
Total Used   : 168679792640 B (157.10 GB) 
Total Used % : 92.29% 
Ozone Used   : 4096 B (4 KB) 
Ozone Used % : 0.00% 
Remaining    : 14096191488 B (13.13 GB) 
Remaining %  : 7.71% 

We want to have a --hostname parameter and also we want the CLI to be more consistent, meaning that the --ip accepts only ip and the new --hostname accepts only hostname. If we just add the --hostname parameter then in the above second case we will have --hostname that accepts hostname and --ip that also accepts only hostname.

As pointed out to me, we can't deprecate the flag or change the current logic in SCMNodeManager because it's necessary for environments like Kubernetes where the datanode can restart and end up with a new Ip address. That's why I created specific APIs for the CLI.

The current code achieves the desired behavior.

from this branch with dfs.datanode.use.datanode.hostname=true

bash-4.2$ ozone admin datanode usageinfo --ip=172.22.0.5
Usage Information (1 Datanodes)

UUID         : 961511cf-38ec-412c-b805-381549ce03cf 
IP Address   : 172.22.0.5 
Hostname     : 16cbb539ec92 
Capacity     : 182775984128 B (170.22 GB) 
Total Used   : 168680542208 B (157.10 GB) 
Total Used % : 92.29% 
Ozone Used   : 4096 B (4 KB) 
Ozone Used % : 0.00% 
Remaining    : 14095441920 B (13.13 GB) 
Remaining %  : 7.71% 

bash-4.2$ ozone admin datanode usageinfo --ip=16cbb539ec92
Usage Information (0 Datanodes)

bash-4.2$ ozone admin datanode usageinfo --hostname=16cbb539ec92
Usage Information (1 Datanodes)

UUID         : 961511cf-38ec-412c-b805-381549ce03cf 
IP Address   : 172.22.0.5 
Hostname     : 16cbb539ec92 
Capacity     : 182775984128 B (170.22 GB) 
Total Used   : 169011408896 B (157.40 GB) 
Total Used % : 92.47% 
Ozone Used   : 4096 B (4 KB) 
Ozone Used % : 0.00% 
Remaining    : 13764575232 B (12.82 GB) 
Remaining %  : 7.53% 

xBis7 avatar Oct 18 '22 12:10 xBis7

Let's discuss this in the next community sync @neils-dev

kerneltime avatar Oct 18 '22 20:10 kerneltime

@sokui @Xushaohong If this Map holds both IPs and Hostnames at all times, will this be an issue? In case we are in an ip changing environment and the ip doesn't work, hostname will still be available for the user. What do you think? Also, if you could provide a way for me to test it.

xBis7 avatar Oct 24 '22 15:10 xBis7

If this Map holds both IPs and Hostnames at all times, will this be an issue? In case we are in an ip changing environment and the ip doesn't work, hostname will still be available for the user. What do you think? Also, if you could provide a way for me to test it.

Currently, this map is fine with IP changing environment since we either use IP or hostname, not both of them simultaneously. The prerequisite is the config dfs.datanode.use.datanode.hostname.
If the map holds both IPs and Hostnames at all times, as it maps hostname / IP to the DNs, there will be the same UUID occurs in different keys' value sets causing a sequence of unknown errors. This part of the structure needs to be designed if you want to do so.

The simple test environment could be found in the kubernetes directory. E.g ozone/hadoop-ozone/dist/target/ozone-1.3.0-SNAPSHOT/kubernetes There is a README.md to teach how to set up an ozone k8s cluster. Kill/ Restart one DN pod could change the IP.

Xushaohong avatar Oct 25 '22 02:10 Xushaohong

Thanks @sokui for your comments. On the errors you mention that may happen when using the map for both ip and hostnames mapping to UUIDs of DNs,

If the map holds both IPs and Hostnames at all times, as it maps hostname / IP to the DNs, there will be the same UUID occurs in different keys' value sets causing a sequence of unknown errors.

Can you give some detail on this? Where this can cause problems?

neils-dev avatar Oct 25 '22 14:10 neils-dev

Thanks @sokui for your comments. On the errors you mention that may happen when using the map for both ip and hostnames mapping to UUIDs of DNs,

If the map holds both IPs and Hostnames at all times, as it maps hostname / IP to the DNs, there will be the same UUID occurs in different keys' value sets causing a sequence of unknown errors.

Can you give some detail on this? Where this can cause problems?

@neils-dev , I did not mean your solution will have problems in this case. I just try to reminder you to be careful of this case. When the variable/method named as hostname, people may assume it is just the hostname not IP, but in some case it is not true. So far in this PR, I do not think this is an issue.

sokui avatar Oct 26 '22 05:10 sokui

@Xushaohong

Kill/ Restart one DN pod could change the IP.

This works fine with the current approach. Any particular scenarios or test cases? Commands to execute? I'm referring to anything more complex than getting the datanode info.

xBis7 avatar Oct 26 '22 13:10 xBis7

This looks much better, will complete my review this week.

kerneltime avatar Oct 26 '22 17:10 kerneltime

Kill/ Restart one DN pod could change the IP.

This works fine with the current approach. Any particular scenarios or test cases? Commands to execute? I'm referring to anything more complex than getting the datanode info.

Maybe you could try on the metal environment and adjust the hostname manually which is the case for hostname change. If the patch is able to handle the IP-changing and hostname-changing cases, that would be great~

Xushaohong avatar Oct 27 '22 02:10 Xushaohong

Kill/ Restart one DN pod could change the IP.

This works fine with the current approach. Any particular scenarios or test cases? Commands to execute? I'm referring to anything more complex than getting the datanode info.

Maybe you could try on the metal environment and adjust the hostname manually which is the case for hostname change. If the patch is able to handle the IP-changing and hostname-changing cases, that would be great~

For k8s, if you deploy datanodes as deployment instead of statefulset, when you delete a node, it will restart with a different hostname, I think.

sokui avatar Oct 27 '22 03:10 sokui

Please add a end to end robot test which should help close the loop for the feature.

kerneltime avatar Oct 31 '22 16:10 kerneltime

Please add a end to end robot test which should help close the loop for the feature.

@kerneltime I added the robot test you requested.

xBis7 avatar Nov 01 '22 13:11 xBis7

@kerneltime After some testing, it seems that making the use of hostname the default option breaks the system except for a kubernetes environment due to pipeline allocation to datanodes. To make this work we would have to make a lot of changes and this PR would grow out of hand unnecessarily. Went back to using the flag for hostnames and now everything works and we still have the desired outcome, that the ip and hostname are both available at the client. If you could take another look and let me know what you think.

xBis7 avatar Nov 05 '22 10:11 xBis7

@xBis7 I plan to finish this, are you OK with that?

adoroszlai avatar Nov 21 '23 10:11 adoroszlai

@adoroszlai Please, feel free to take this over.

As far as I can recall, these were the unresolved issues that blocked this ticket

  • We need to use hostnames with Kubernetes because it keeps changing Node IPs dynamically
  • It would be nice to migrate Ozone entirely to using hostnames
  • Ozone is too dependent on IPs for topology and therefore moving entirely to hostnames is impossible for now
  • Based on this flag dfs.datanode.use.datanode.hostname we can either have IPs available or hostnames. SCMNodeManager needs refactoring to have both of them available

xBis7 avatar Nov 21 '23 11:11 xBis7

As far as I can recall, these were the unresolved issues that blocked this ticket

Thanks @xBis7, that's very useful for the future. However, here I don't intend to address all hostname-related problems, just trying to wrap it up considering your latest comment:

Went back to using the flag for hostnames and now everything works and we still have the desired outcome, that the ip and hostname are both available at the client.

adoroszlai avatar Nov 22 '23 06:11 adoroszlai

@xBis7 please let me know if you are OK with the latest update to be merged

adoroszlai avatar Nov 27 '23 16:11 adoroszlai

@adoroszlai Thanks for finishing this. I've gone through the changes. LGTM!

xBis7 avatar Nov 27 '23 17:11 xBis7

Thanks @xBis7 for the patch, @kerneltime, @sadanand48, @sokui, @Xushaohong for the review.

adoroszlai avatar Nov 27 '23 17:11 adoroszlai