azure-sdk-for-python icon indicating copy to clipboard operation
azure-sdk-for-python copied to clipboard

Azure AI Evaluation SDK v1.13.7: start_red_team_run() produces runs with target=null, causing completed external scans to be invisible in Foundry portal

Open ctava-msft opened this issue 3 weeks ago • 3 comments

Environment

SDK & package: Azure AI Evaluation SDK azure-ai-evaluation v1.13.7 Python: 3.10.x (also observed on 3.9.x) Auth: DefaultAzureCredential() Project architecture:

Legacy Hub: MachineLearningServices/workspaces (primary) Also tested with Foundry: CognitiveServices/accounts via project endpoint

Region: (e.g., eastus) OS: Windows 11 / Ubuntu 22.04 (observed on both) Portal: Azure AI Foundry (Evaluations → AI red teaming)

Description When initiating AI Red Teaming scans against non‑Azure‑hosted targets (external callbacks, Kong, LM Studio, etc.) using RedTeam.start_red_team_run() / scan(), the SDK uploads a RedTeamUpload that contains a displayName but leaves the target field as null.

The run completes successfully and is retrievable via az rest. The run does not appear in the Foundry portal’s AI red teaming list, which appears to filter on target. Portal‑initiated runs (GUI) against Foundry‑deployed models populate target and display correctly. All 10 SDK‑initiated external runs show target=null and are invisible despite status=Completed.

This behavior is reproducible primarily in legacy Hub projects; the Foundry architecture using AIProjectClient with model configs sets target and shows up, but the external callback path remains affected. Impact

Portal visibility gap: Completed external scans are invisible in Foundry UI, breaking centralized reporting, review workflows, and compliance evidence gathering. Operational friction: Teams must use az rest/raw APIs to find runs; portal drill‑downs (risk breakdown, row‑level attack/response) are unavailable. CI/CD & governance: Automation that expects portal surfacing of evaluation artifacts cannot rely on SDK for external targets.

Repro steps

Create a RedTeam using the Evaluation SDK pointing to a Hub project (legacy ML workspace) or Foundry project endpoint: Pythonfrom azure.ai.evaluation.red_team import RedTeam, RiskCategoryfrom azurefrom azure.identity import DefaultAzureCredentialazure_ai_project = { "subscription_id": "...", "resource_group_name": "...", "project_name": "..."} # or Foundry project URLred_team = RedTeam( azure_ai_project=azure_ai_project, credential=DefaultAzureCredential(), risk_categories=[RiskCategory.Violence, RiskCategory.HateUnfairness])# External (non‑Azure‑hosted) callback targetasync def external_callback(prompt: str) -> str: return "external system response"Show more lines

Confirm completion via CLI: Shellaz rest --method get --url "https://{account}.services.ai.azure.com/api/projects/{project}/redteams/runs/{runId}"Show more lines

Observe payload: JSON{ "id": "4b4ce898-045f-48b3-9335-9b18c338a97f", "displayName": "scan_20251205_195954", "target": null, "status": "Completed"Show more lines

Open Foundry portal → Project → Evaluations → AI red teaming. Actual: Run does not appear. Expected: Completed run should be listed with drill‑down.

Expected behavior For any completed red team run—including external callback targets—the SDK should persist a non‑null target descriptor (or the portal should not require target for visibility). The run should appear in the Foundry portal’s AI red teaming tab with full drill‑down. Actual behavior

RedTeamUpload.target == null for SDK‑initiated external scans. Portal hides those runs; only GUI‑initiated or SDK runs with explicit target configs display.

Additional observations & references

Foundry docs show portal visibility when a target is set via AIProjectClient and model configuration; external callback examples do not demonstrate a persisted target value, aligning with our observations.

Cloud run with AIProjectClient and target config: Learn: “Run AI Red Teaming Agent in the cloud” (payload includes target) [AIProjectC...soft Learn | Learn.Microsoft.com] Local agent & external callback pattern (no persisted target shown): Learn: “Run AI Red Teaming Agent locally” [Run AI Red...soft Learn | Learn.Microsoft.com] Portal viewing guidance (assumes run is logged with usable metadata): Learn: “View AI red teaming results” / “Run scans…”

Hypothesis

The SDK’s serialization path for external callback targets does not construct a target object (e.g., ExternalCallback or GenericEndpoint), leaving it null. The Foundry portal UI filters/queries rely on target to display runs and do not include runs with target=null.

Requested fix

SDK enhancement:

Provide a target schema for external/non‑Azure targets (e.g., type: ExternalCallback, with identifier fields), and populate it when scan(target=callable) is used. Alternatively, allow setting a target descriptor explicitly in RedTeam.scan()/start_red_team_run() for callbacks.

Portal tolerance (optional):

Do not filter out runs where target == null; display them with a generic “External target” label, so results remain discoverable.

Backfill/patch API (nice to have):

Support updating run metadata to attach a target after creation, enabling visibility for already completed runs.

Workarounds we tested

AIProjectClient with model configuration → visible (sets target). External callback with SDK → invisible (no target). Manual review via az rest → possible but not viable for teams relying on portal analytics.

Logs / payloads

Example run (IDs sanitized above). We can provide full request/response traces privately if needed.

Severity High for customers using external endpoints: prevents usage of Foundry portal for red teaming result review, reporting, and governance.

ctava-msft avatar Dec 08 '25 16:12 ctava-msft

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @luigiw @needuv @singankit.

github-actions[bot] avatar Dec 08 '25 18:12 github-actions[bot]

@ctava-msft red teaming will not work in eastus region. Please use a project in a supported region: https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/develop/run-scans-ai-red-teaming-agent?view=foundry&preserve-view=true#region-support . Once you have switched regions please confirm if the issue still occurs. The UI does not filter by target, so you should be unblocked once you are in a supported region.

slister1001 avatar Dec 09 '25 15:12 slister1001

Hi @ctava-msft. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

github-actions[bot] avatar Dec 09 '25 17:12 github-actions[bot]

@slister1001 @ctava-msft

Comprehensive Testing Results & SDK Code Trace

We completed extensive A/B testing across multiple dimensions and traced the issue to a specific location in the SDK code. This response provides evidence that the issue is not related to regional support or external vs internal targets.


1. Test Environment

Component Value
SDK Version azure-ai-evaluation==1.13.7
Python 3.12.8
Architecture New Foundry (Microsoft.CognitiveServices/accounts with AIServices kind)
Regions Tested East US 2 (recommended), Sweden Central (recommended)

2. A/B Test Matrix

We tested 4 combinations to isolate variables:

Test Region Target Type Target Endpoint SDK Console Output API target Field
A East US 2 External LM Studio (localhost:1234) Track in AI Foundry: None null
A2 East US 2 External LM Studio (localhost:1234) Track in AI Foundry: None null
B East US 2 Foundry-hosted Azure OpenAI gpt-4o-mini Track in AI Foundry: None null
B2 Sweden Central Foundry-hosted Azure OpenAI gpt-4o-mini Track in AI Foundry: None null

Test B/B2 Setup (Foundry-Hosted Target)

To isolate whether the issue was specific to external callbacks, we deployed gpt-4o-mini in the same Foundry project and created a callback targeting it:

# Deployed gpt-4o-mini in same Foundry project
AZURE_OPENAI_ENDPOINT = "https://airtagent-ai-eastus2.openai.azure.com"
DEPLOYMENT_NAME = "gpt-4o-mini"

def create_azure_openai_callback():
    def callback(query: str) -> str:
        url = f"{AZURE_OPENAI_ENDPOINT}/openai/deployments/{DEPLOYMENT_NAME}/chat/completions?api-version=2024-08-01-preview"
        headers = {"Content-Type": "application/json", "api-key": AZURE_OPENAI_API_KEY}
        payload = {"messages": [{"role": "user", "content": query}], "temperature": 0.7, "max_tokens": 2048}
        with httpx.Client(timeout=120.0) as client:
            response = client.post(url, json=payload, headers=headers)
            response.raise_for_status()
            return response.json()["choices"][0]["message"]["content"]
    return callback

# SDK scan using Foundry-hosted model as target
red_team = RedTeam(
    azure_ai_project="https://airtagent-ai-eastus2.services.ai.azure.com/api/projects/redteam",
    credential=DefaultAzureCredential(),
    risk_categories=[RiskCategory.Violence],
)
await red_team.scan(target=create_azure_openai_callback(), scan_name="test-foundry-target", num_objectives=3)

Result

Both external AND Foundry-hosted targets produce target=null. The issue is not related to where the target is hosted.


3. Variables Eliminated

Hypothesis Tested Result
External targets cause target=null ✅ Tested Foundry-hosted Azure OpenAI Same behavior
Non-recommended regions cause issue ✅ Tested East US 2 + Sweden Central Same behavior
Legacy Hub architecture causes issue ✅ Tested new CognitiveServices/accounts Same behavior
Regional API differences ✅ Both regions accept SDK uploads Same behavior

4. Version History Analysis

We analyzed the SDK across multiple versions:

Version red_team Module _mlflow_integration.py RedTeamUpload.target Field Code Populates target?
1.0.0 ❌ Not present N/A N/A N/A
1.8.0 ✅ Present ❌ Not present N/A N/A
1.12.0 ✅ Present ✅ Present ❌ Field NOT in model N/A
1.13.0 ✅ Present ✅ Present ✅ Field ADDED to model ❌ NO
1.13.7 ✅ Present ✅ Present ✅ Present ❌ NO

Finding: The target field was added to RedTeamUpload model in v1.13.0, but the SDK code that creates RedTeamUpload instances was never updated to populate it.


5. SDK Code Trace

Location: _mlflow_integration.py lines 111-116

if self._one_dp_project:
    response = self.generated_rai_client._evaluation_onedp_client.start_red_team_run(
        red_team=RedTeamUpload(
            display_name=run_name or f"redteam-agent-{datetime.now().strftime('%Y%m%d-%H%M%S')}",
        )  # <-- ONLY display_name is passed
    )

The RedTeamUpload Model (_models.py lines 5074-5148)

class RedTeamUpload(_Model):
    display_name: Optional[str] = rest_field(name="displayName", ...)
    target: Optional["_models.TargetConfig"] = rest_field(...)  # <-- Field exists but not populated

Additional Context

MLflowIntegration.__init__() (line 44-62) does not receive target information from RedTeam.scan(). The target would need to be passed through the call chain.


6. API Schema Analysis

We reviewed the API specification to understand what target values are valid.

Source: TypeSpec Definition

@discriminator("type")
model TargetConfig {
  type: string;
}

model AzureOpenAIModelConfiguration extends TargetConfig {
  type: "AzureOpenAIModel";
  modelDeploymentName: string;
}

Source: REST API - Red Teams Create

The API documentation confirms TargetConfig uses a discriminator pattern with AzureOpenAIModelConfiguration as the only documented subtype.

Implication

The API schema currently has one TargetConfig subtype: AzureOpenAIModelConfiguration (discriminator: "AzureOpenAIModel"). There is no subtype for callback-based or external endpoint targets.


7. Root Cause Analysis

The issue has two layers:

Layer Issue Evidence
SDK _mlflow_integration.py doesn't populate target field Code trace in Section 5
API No TargetConfig subtype exists for callback targets TypeSpec shows only AzureOpenAIModelConfiguration

For callback-based scans (the SDK's primary use case for external LLMs), there is currently no valid TargetConfig type to populate.


8. API Evidence

Runs are accessible via REST API but invisible in portal:

{
  "value": [
    {
      "id": "6fd37129-c620-4f61-8578-058378bcff4f",
      "displayName": "test-166-foundry-target-20251213-174501",
      "target": null,
      "status": "Completed",
      "properties": {
        "AiStudioEvaluationUri": "https://ai.azure.com/resource/build/redteaming/..."
      }
    }
  ]
}

Portal shows: "No red teams found"


9. Questions

  1. Portal visibility: You mentioned "The UI does not filter by target." What field(s) determine whether a run appears in the portal? Our runs have status: "Completed" and valid AiStudioEvaluationUri but remain invisible.

  2. Callback target support: Is there a plan to add a TargetConfig subtype for callbacks? The SDK documentation shows callbacks as a primary target type, but the API has no way to represent them.

  3. Acceptable behavior for target=null: Should SDK-initiated callback scans display in the portal with a placeholder (e.g., "Custom Target") rather than being hidden?


Summary

Finding Evidence
Issue is NOT regional Tested East US 2 + Sweden Central
Issue is NOT external vs internal targets Tested LM Studio + Azure OpenAI gpt-4o-mini
Issue is NOT Legacy vs New Foundry Tested new CognitiveServices/accounts
SDK doesn't populate target _mlflow_integration.py:112-116
API has no callback target type TypeSpec shows only AzureOpenAIModelConfiguration

pr0b3r7 avatar Dec 13 '25 23:12 pr0b3r7

Hi @ctava-msft, we're sending this friendly reminder because we haven't heard back from you in 7 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

github-actions[bot] avatar Dec 22 '25 21:12 github-actions[bot]