CloudShell icon indicating copy to clipboard operation
CloudShell copied to clipboard

[BUG] Your sign-in was successful but your admin requires the device requesting access to be managed by Microsoft Non-Production to access this resource.

Open ekgershg opened this issue 1 year ago • 2 comments

To Reproduce

Commands you ran

az login --tenant XXXXX

Observed Behavior

Help us keep your device secure
Your sign-in was successful but your admin requires the device requesting access to be managed by Microsoft Non-Production to access this resource.


Troubleshooting details
If you contact your administrator, send this info to them.

Error Code: 530033
Request Id: b20babdf-e2a4-4fb2-a270-2b0f5b3b4a00
Correlation Id: 
Timestamp: 2024-10-02T10:35:27.070Z
App name: Microsoft Azure CLI
App id: 
IP address: 
Device identifier: 
Device platform: Windows 10
Device state: Compliant
Flag sign-in errors for review: [Enable flagging](https://login.microsoftonline.com/common/debugmode)
If you plan on getting help for this problem, enable flagging and try to reproduce the error within 20 minutes. Flagged events make diagnostics available and are raised to admin attention.

Expected behavior

A clear description of what you expected to happen instead.

Login from Azure Portal CLI into Microsoft Non-Production subscription. Or executing "az login" will automatically connect to my default tenant/subscription.

Is this specific to Cloud Shell?

Please verify if the same issue can be reproduced by running the same tool outside Cloud Shell - for example, by installing it on your own computer. If so, it is likely to be a bug in that tool or in the Azure service it communicates with, not in Cloud Shell. Please file the issue with the appropriate project.

Local execution of "az login" doesn't have this issue.

Interface information

How are you accessing Cloud Shell - https://shell.azure.com, https://portal.azure.com, via Windows Terminal, or some other method? If a browser, which Operating System and browser are you using? (ex. Edge on Windows 10)

https://portal.azure.com Windows 10

Additional context

Add any other context about the problem here.

ekgershg avatar Oct 02 '24 10:10 ekgershg

@wyclearnpy there shouldn't be any issue with you labeled data. If you are using frame 0, for example, the code will automatically load frames [0, 0, 0, 1, 2]. If you are using the last frame (say n), the code will automatically load frames [n-2, n-1, n, n, n].

The error that you're getting appears to be from the unlabeled data - are you fitting a semi-supervised model? If so, I would first recommend fitting a supervised context model to make sure the above error does not appear (don't need to train it out fully). If that works, then the issue is that your unlabeled batch size is too small. Since the context model requires two frames before and after the frame that is being processed, if your unlabeled batch size (in the config file under dali.context.train.batch_size) is <=4 you'll get this error. So it needs to be at least 5.

I'll make that error message more descriptive, thanks for the flag. Let me know how it goes!

themattinthehatt avatar Oct 11 '24 13:10 themattinthehatt

@wyclearnpy just wanted to check in on this and see if you were able to train with an increased unlabeled batch size?

themattinthehatt avatar Oct 17 '24 18:10 themattinthehatt

Yes, increasing it to 5 allowed it to run successfully, mainly due to insufficient device memory. I'm really looking forward to using multiple GPUs for training. Additionally, the performance of the semi-supervised method seems to be slightly worse compared to the supervised method. What could be the reasons for this? In the semi-supervised setup, my image size is 256, while in the supervised setup, it is 384. labeled_img

wyclearnpy avatar Oct 18 '24 01:10 wyclearnpy

@wyclearnpy glad you were able to get the unsupervised model training. One comment is that to really compare the semi-supervised and supervised models you should set the resizing to be the same, otherwise the comparison between the two will be confounded by that (very important) factor. So maybe you could try super vs semi-super on 256x256 frames first, just to get an idea of how they compare?

I have a few other questions:

  • how big are your original images?
  • how much GPU memory do you have?
  • how many labeled frames do you have?

themattinthehatt avatar Oct 18 '24 16:10 themattinthehatt

My original image size is 1280x1024, gup memory is two 11g 1080ti servers, but it can only be trained with a single gpu, my label image has a total of 420 frames

wyclearnpy avatar Oct 20 '24 10:10 wyclearnpy

Also, I'm trying semi-supervised training in 384x384 size

wyclearnpy avatar Oct 20 '24 10:10 wyclearnpy

ok, so I would suggest the following type of experiment to compare supervised and semi-supervised (both context):

  • set resize dims to 256x256 (can change this later)
  • set training.train_batch_size to 8
  • set dali.context.train.batch_size to 8

I believe that should fit on an 11GB card just fine. Then you can see what the unsupervised losses are buying you.

Separately, you could fit a supervised context model with resize dims set to 384x384 (with batch size of 8 still) and compare that to the supervised context model above fit on 256x256 resizing. This will allow you to see how much the resize dims matter.

We're a couple days away from having multi-GPU support for semi-supervised models, so you'll soon be able to test out the semi-supervised models with 384x384 resizing! Please follow this PR to keep up to date: https://github.com/paninski-lab/lightning-pose/pull/207

themattinthehatt avatar Oct 21 '24 14:10 themattinthehatt

I tried the fully supervised model at 384x384 size and it worked great, but I tried the semi-supervised context model and it didn't work well. Below are my two configuration files, can you help me find out what is the reason?

config_singleview_fish.txt config_TCN_singleview_fish.txt

Compare two models

wyclearnpy avatar Oct 31 '24 12:10 wyclearnpy

@wyclearnpy I would recommend comparing two models first: fully supervised (already done), and supervised context (i.e. model_type: heatmap_mhcrnn and losses: []). Importantly, you should make every other parameter the exact same for these two models (including batch sizes) so that the only differences are due to the model type.

I see that you're using a ResNet-101 in the config file - you might also try going down to ResNet-50 if you want to play with context and unsupervised losses, since each of these features requires more memory (and a smaller model will use less memory).

Regarding the semi-supervised model, please remind me: does your dataset contain images from several different views? If so, would you mind sharing some images from the different views (with labels) so I can see how they look?

themattinthehatt avatar Oct 31 '24 13:10 themattinthehatt

Yes my data comes from three cameras with different views, below is some of my labeled data fish.zip

wyclearnpy avatar Oct 31 '24 13:10 wyclearnpy

This is labeled data from different perspectives img030_bodypart img009_bodypart ![img026_bodypart](https://github.com/user-att img043_bodypart img010_bodypart achments/assets/df50be16-47c7-439e-9268-7c82f27edc3c) img007_bodypart

wyclearnpy avatar Oct 31 '24 13:10 wyclearnpy

very cool, thanks for sharing. I would suggest for now only turning on the "temporal" unsupervised loss. The other losses will not work well for this setup for the following reasons:

pca_singleview: this loss is not going to perform very well when you have different camera views. I think it would work quite well if all your data were all top-down, for example, but the pca subspace is going to be hard to estimate from different views unless you have a lot of data.

pca_multiview: this loss currently only works under two conditions, neither of which are met by your dataset:

  1. you have a mirrored setup, and all views are simultaneously present in each frame
  2. you are using a multiview model, which requires a dataset where labels exist across all views for a given point in time; if I remember correctly you have labeled frames from different views but not the corresponding labels for the other views at each time point (which is required to train the multiview model, as we previously discussed)

So I would suggest comparing 4 different models (again only changing model_type/losses and keeping all other hyperparameters the same for fair comparison)

  1. supervised (model.model_type: heatmap; model.losses_to_use: [])
  2. context (model.model_type: heatmap_mhcrnn; model.losses_to_use: [])
  3. semi-supervised (model.model_type: heatmap; model.losses_to_use: [temporal])
  4. semi-supervised context (model.model_type: heatmap_mhcrnn; model.losses_to_use: [temporal])

Additonally, I would also recommend setting

  • training.train_prob: 0.95 and training.val_prob: 0.05 in order to train your model with more data (this is now the setting in our default config file) - you could also do 90/10 if you feel you don't have enough validation data
  • training.num_workers: 8 if you have the extra cpu cores (this will make loading the labeled frames faster)

Please let me know how it goes!

themattinthehatt avatar Oct 31 '24 17:10 themattinthehatt

I will also say that, generally speaking, looking at pixel error on test frames is an ok place to start when comparing models, but generally this really misses the nuanced differences that can exist between the models. The best way to see how well the different models are doing would be to plot skeletons from the predictions of two different models on top of each other on a little snippet of video(s), and see how they compare.

themattinthehatt avatar Oct 31 '24 17:10 themattinthehatt

Thank you very much for your patient reply. I will let you know the result.

wyclearnpy avatar Nov 01 '24 01:11 wyclearnpy