cli icon indicating copy to clipboard operation
cli copied to clipboard

Usability issues wrt managing Isolation Segments

Open bboe-pivotal opened this issue 6 years ago • 7 comments

CF version tested

  • cf 6.36.2+18ceab10f.2018-05-16
  • PCF 1.11

Commands affected

  • cf org...
  • cf space...
  • cf set-org-default-isolation-segment
  • cf enable-org-isolation

Background for request

This issue is based on a customer troubleshooting session where the user thought they had enabled a default isolation segment for an organization, but it turned out to only be enabled for the org. They therefore ended up having new spaces that by accident were deploying applications to the wrong isolation segment.

Included with this issue is therefore a set of recommendations on how to address this as we ended up using the REST APIs directly to validate our configuration and get to the root cause of the issue.

Risk and implications of this issue

The risk of a badly configured isolation segment configuration can range from at best that an application won't deploy until that an application runs on wrong infrastructure, which may affect platform stability and / or certifications that the application is required to fulfill.

In our customer's case, we were dealing with a badly configured production environment with a CPU intensive .NET application on Windows Server 2012R2. That means this application would under full load have made other .NET production applications irresponsive. The isolation segment is therefore used to isolate the CPU intensive workloads from other applications.

Recommendations

  1. Visualize "default isolation segment" as a data point that's included when looking at information about an organization or space. Right now, cf org and cf space will show an empty response for isolation segments if there's no isolation segment actually assigned to the space. This info is wrong as there is a default isolation segment in the system and that an application will always be assigned to an isolation segment. This also helps an administrator confirm if an org or space is configured correct or not. This information would have made it a lot easier for the customer to understand their current configuration as it wasn't clear to them what the empty response really meant when running cf space.
  2. Improve details for space isolation segment configuration. The key things that an operator needs to know is both what isolation segment the space is assigned to, which includes the default isolation segment, and why. The important info wrt why is to determine if the isolation segment is set explicitly for this particular space or if the value is inherited from the parent org. This particular info would have been extremely useful to the customer to determine why two different spaces in their case were configured differently.
  3. Include an extra field in cf org to show what the default isolation segment is. Right now cf org only has one isolation segments field, whose value can be set both through set-org-default-isolation-segment and enable-org-isolation. This information is available in the cf API and it would have made the troubleshooting significantly easier if it had been visualized.
  4. Enable PCF administrator to disable access to default isolation segment for an org Given that cf org does not show that access to the default isolation segment is enabled, then the risk is still there for that a space either by accident or intentionally may be assigned to the default isolation segment when it should not have been. In the customer's case, the goal was to assign the whole org and whatever spaces it had to a given isolation segment. This failed both due to that the administrator had no way to see that the default isolation segment was set wrong. It also failed as there doesn't seem to be a way to disable access to the default isolation segment and it's not even clear to the administrator that this access is still open.
  5. Show in the cf app details what isolation segment an application runs under. It's fair to assume that an application shouldn't be able to change what isolation segment it's running in, but it would also be valuable information to show, both during a cf push, as well as under cf app. The customer detected their misconfiguration by accident when analyzing application logs in LogStash and we usually don't even display this information. Displaying this as a part of the application will make it more visible to the user and make it easier to detect a misconfiguration.
  6. Improved error logging and validation of isolation segment configuration when pushing an application The customer has also in the past run into an issue where they assigned an org and space to an isolation that was later deleted from the system. The org and space still kept that configuration that was pointing to the wrong isolation segment. cf push would then fail with a generic message that it couldn't find a cell to deploy the application and it took 2-3 days to get to the root cause of this basic configuration issue. The root cause of this was ultimately a customer configuration error and it's not given that just removing or changing that configuration for an org and space is the right thing to do. A good place to start would however be an improved error message that makes it easier to troubleshoot a this kind of issue.

bboe-pivotal avatar Jul 03 '18 17:07 bboe-pivotal

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/158802921

The labels on this github issue will be updated when the story is started.

cf-gitbot avatar Jul 03 '18 17:07 cf-gitbot

Thank you @bboe-pivotal for creating this issue, much appreciated. We will review and get back to you soon as we can.

abbyachau avatar Jul 03 '18 21:07 abbyachau

Hi @bboe-pivotal, thanks for creating this issue.

  1. For cf org org-name, we could consider setting isolation segment to shared if no isolation segments have been explicitly set. It appears you could explicitly assign the shared iso seg to the org.

  2. Same as above, and I believe the default isolation segment appears in the cf space space-name information, however if it is the default (shared) isolation segment, that information will not appear unless explicitly set.

  3. The default isolation segment is shown in cf org org-name, however if the org is using the shared isolation segment, then that information does not currently appear unless explicitly set. If we added the shared information to #1 we could add the shared default information cf org org-name as well if not explicitly set.

  4. cf disable-org-isolation allows you to disable an isolation segment for a given org. It appears you may also disable the shared isolation segment:

    $ ./cf disable-org-isolation dodo shared
    Removing entitlement to isolation segment shared from org dodo as admin...
    OK
    
  5. cf app should be showing the isolation segment information. I will need to check whether it will show the shared segment - hunch is it will not.

  6. We could look into adding more details to the cf push failure; looks like it currently just returns:

 Error staging application: Found no compatible cell

If I've missed out on anything, or if you have any comments, please let us know.

cc / @shalako @shubhaat for any omissions or corrections to the above

abbyachau avatar Jul 12 '18 18:07 abbyachau

Current

~/workspace $ cf org scoen
Getting info for org scoen as scoen...

name:                 scoen
domains:              apps.internal, istio-acceptance.routing.cf-app.com,
                      istio.istio-acceptance.routing.cf-app.com
quota:                default
spaces:               test
isolation segments: private-is1

~/workspace $ cf space test1
Getting info for space test1 in org scoen as scoen...

name:                      test1
org:                       scoen
apps:
services:
isolation segment:
space quota:
running security groups:   dns, public_networks
staging security groups:   dns, public_networks

~/workspace $ cf space test2
Getting info for space test1 in org scoen as scoen...

name:                      test2
org:                       scoen
apps:
services:
isolation segment: private-is1
space quota:
running security groups:   dns, public_networks
staging security groups:   dns, public_networks

Desired

~/workspace $ cf org scoen
Getting info for org scoen as scoen...

name:                 scoen
domains:              apps.internal, istio-acceptance.routing.cf-app.com,
                      istio.istio-acceptance.routing.cf-app.com
quota:                default
spaces:               test
isolation segments: shared (default), private-is1

~/workspace $ cf space test1
Getting info for space test in org scoen as scoen...

name:                      test
org:                       scoen
apps:
services:
isolation segment: shared
space quota:
running security groups:   dns, public_networks
staging security groups:   dns, public_networks

~/workspace $ cf space test2
Getting info for space test1 in org scoen as scoen...

name:                      test1
org:                       scoen
apps:
services:
isolation segment: private-is1
space quota:
running security groups:   dns, public_networks
staging security groups:   dns, public_networks
  1. Response for cf org scoen should show all isolation segments the org is entitled to, included the shared IS
  2. Response for cf space test should always show a value for isolation segment and whether it is the org default or explicitly configured
  3. Response for cf org scoen should show which isolation segment is the org default; shared is the org default IS when an OrgManager hasn't set one explicitly
  4. @abbyachau answered this
  5. cf app myapp does not currently expose the IS, nor does cf push. I am of the opinion that app developers shouldn't need to be aware of isolation segments at all, only what space to use. Assuming cf space clearly shows what IS a space is associated with, would that be sufficient?
  6. We should improve the error message. We could consider validations when an IS is deleted, such as preventing deletion of an IS when it is set as an org default or assigned to a space, or to reassign spaces to the org default, and the org default to the shared IS. But failing the push does seem like a good forcing function to have the OrgManager make a conscious decision, rather than doing something unpredictable.

shalako avatar Jul 17 '18 23:07 shalako

Hi @bboe-pivotal please let us know if you would object if we removed the isolation segment information from the cf app app-name summary? If cf space provided this information, would that be sufficient? Thanks.

abbyachau avatar Jul 19 '18 00:07 abbyachau

Update on this issue:

We've fixed #2 above on the v7 beta CLI - cf space space-name should show if the iso seg is a default or if it is explicitly set.

See https://www.pivotaltracker.com/story/show/167168113

abbyachau avatar Aug 08 '19 21:08 abbyachau

+1 on the issue. We are exploring the Isolation Segment, and found the relationship with org and space is confusing. The lack of explicit information of the “shared” IS doesn’t help.

Allowing all orgs and spaces to access the shared IS is both confusing and a security risk. We are planning to use cfmgmt to automate and enforce the segregation, but would have no confidence if we rely on an operator to do it right.

I will be happy if an org and the associated spaces is only allowed one IS. It makes things simpler.

ywei2017 avatar Apr 04 '21 20:04 ywei2017