extensions icon indicating copy to clipboard operation
extensions copied to clipboard

Add LoadFromAsync and SaveToAsync helper methods to DataContent

Open Copilot opened this issue 2 weeks ago • 7 comments

  • [x] Create MediaTypeMap polyfill in LegacySupport directory with System.Net.Mime namespace
  • [x] Add static LoadFromAsync methods to DataContent class:
    • [x] Stream overload that accepts a Stream (infers name and mime type from FileStream's path if applicable)
    • [x] String path overload that infers name and mime type from the file path
    • [x] Pre-initialize MemoryStream capacity when source stream is seekable for performance
  • [x] Add instance SaveToAsync method to DataContent class:
    • [x] Handles directory paths by inferring filename from Name property or using random name
    • [x] Always infers extension from media type if path has none
    • [x] Returns the actual path where data was saved
    • [x] Fix path traversal security issue by using Path.GetFileName(Name)
  • [x] Remove [Experimental] attribute and add methods to JSON API baseline as stable
  • [x] Use GetBuffer() instead of ToArray() for MemoryStream efficiency
  • [x] Add WriteAllBytesAsync(ReadOnlyMemory<byte>) polyfill to remove #ifdef from SaveToAsync
  • [x] Change return types from Task<T> to ValueTask<T> for all new public async methods
  • [x] Add C# 14 extension member polyfills for File.ReadAllBytesAsync and File.WriteAllBytesAsync in LegacySupport directory
  • [x] Remove name parameter from LoadFromAsync(Stream) - Name can be assigned after the call
  • [x] Add comprehensive unit tests for the new functionality
  • [x] Fix net462 build: Add InjectFilePolyfillsOnLegacy to test project
  • [x] Add test for SaveToAsync with empty string path
  • [x] Simplify extension inference by removing unnecessary null check
  • [x] Update MarkItDownMcpReader.cs to use DataContent.LoadFromAsync with optional media type parameter
  • [x] Run code review and security checks
  • [x] Switch components to use MediaTypeMap for extension/MIME type mappings:
    • [x] ImageGeneratorExtensions.cs - Replace _extensionToMimeType dictionary with MediaTypeMap.GetMediaType
    • [x] OpenAIClientExtensions.cs - Replace ImageUriToMediaType with MediaTypeMap.GetMediaType
    • [x] OpenAIImageGenerator.cs - Replace _mimeTypeToExtension dictionary with MediaTypeMap.GetExtension
    • [x] IngestionDocumentReader.cs - Remove custom lookup, only use MediaTypeMap.GetMediaType
Original prompt

This section details on the original issue you should resolve

<issue_title>We should have helper methods / extensions for DataContent</issue_title> <issue_description>At a minimum we should have Load and Save.

Load should read from a path and preserve the file name loaded and try to decode a mime-type from that.

Save should save to a path, with an optional filename, using Name if a directory is provided in the path.

Perhaps we could consider a filename without extension to be provided and we infer the extension from the mime-type and return the saved path.

We should also have Stream based helpers.</issue_description>

<agent_instructions>Add static LoadFromAsync and instance SaveToAsync methods to DataContent. There should be a string path overload for loading that infers the name and mime type from the path (it should add a polyfill to common for https://github.com/dotnet/runtime/blob/main/src/libraries/System.Net.Mail/src/System/Net/Mime/MediaTypeMap.cs and use that to infer the mime type). There should also be a stream based one that should accept a Stream. If it's actually a FileStream it should default to inferring name and mime type from the filestream's path. It should fall back to octet stream if unknown. SaveAsync should accept a required path to save to; it should accept an optional bool parameter defaulting to true for inferExtension that will add an extension to the provided path if it doesn't have one, based on the mime type and MediaTypemap. It should return the path to which the data was saved. </agent_instructions>

Comments on the Issue (you are @copilot in this section)

  • Fixes dotnet/extensions#6825

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Microsoft Reviewers: Open in CodeFlow

Copilot avatar Dec 16 '25 15:12 Copilot

@copilot

All review comments addressed in 840f61c. Changes include: MediaTypeMap moved to LegacySupport with full mappings, methods are now stable (not experimental), DefaultMediaType moved to DataContent as private const, using GetBuffer() for MemoryStream, proper null handling for Name, SaveToAsync handles directory paths and always infers extensions, and NET9_0_OR_GREATER conditional for ReadOnlyMemory overload.

Copilot avatar Dec 16 '25 17:12 Copilot

@copilot there are build failures

Build FAILED.

D:\a_work\1\s\test\Libraries\Microsoft.Extensions.AI.Abstractions.Tests\Contents\DataContentTests.cs(290,24): error CS0117: 'File' does not contain a definition for 'WriteAllBytesAsync' [D:\a_work\1\s\test\Libraries\Microsoft.Extensions.AI.Abstractions.Tests\Microsoft.Extensions.AI.Abstractions.Tests.csproj::TargetFramework=net462] D:\a_work\1\s\test\Libraries\Microsoft.Extensions.AI.Abstractions.Tests\Contents\DataContentTests.cs(317,24): error CS0117: 'File' does not contain a definition for 'WriteAllBytesAsync' [D:\a_work\1\s\test\Libraries\Microsoft.Extensions.AI.Abstractions.Tests\Microsoft.Extensions.AI.Abstractions.Tests.csproj::TargetFramework=net462] D:\a_work\1\s\test\Libraries\Microsoft.Extensions.AI.Abstractions.Tests\Contents\DataContentTests.cs(344,24): error CS0117: 'File' does not contain a definition for 'WriteAllBytesAsync' [D:\a_work\1\s\test\Libraries\Microsoft.Extensions.AI.Abstractions.Tests\Microsoft.Extensions.AI.Abstractions.Tests.csproj::TargetFramework=net462] D:\a_work\1\s\test\Libraries\Microsoft.Extensions.AI.Abstractions.Tests\Contents\DataContentTests.cs(370,24): error CS0117: 'File' does not contain a definition for 'WriteAllBytesAsync' [D:\a_work\1\s\test\Libraries\Microsoft.Extensions.AI.Abstractions.Tests\Microsoft.Extensions.AI.Abstractions.Tests.csproj::TargetFramework=net462] D:\a_work\1\s\test\Libraries\Microsoft.Extensions.AI.Abstractions.Tests\Contents\DataContentTests.cs(438,47): error CS0117: 'File' does not contain a definition for 'ReadAllBytesAsync' [D:\a_work\1\s\test\Libraries\Microsoft.Extensions.AI.Abstractions.Tests\Microsoft.Extensions.AI.Abstractions.Tests.csproj::TargetFramework=net462] D:\a_work\1\s\test\Libraries\Microsoft.Extensions.AI.Abstractions.Tests\Contents\DataContentTests.cs(466,47): error CS0117: 'File' does not contain a definition for 'ReadAllBytesAsync' [D:\a_work\1\s\test\Libraries\Microsoft.Extensions.AI.Abstractions.Tests\Microsoft.Extensions.AI.Abstractions.Tests.csproj::TargetFramework=net462] D:\a_work\1\s\test\Libraries\Microsoft.Extensions.AI.Abstractions.Tests\Contents\DataContentTests.cs(498,47): error CS0117: 'File' does not contain a definition for 'ReadAllBytesAsync' [D:\a_work\1\s\test\Libraries\Microsoft.Extensions.AI.Abstractions.Tests\Microsoft.Extensions.AI.Abstractions.Tests.csproj::TargetFramework=net462] D:\a_work\1\s\test\Libraries\Microsoft.Extensions.AI.Abstractions.Tests\Contents\DataContentTests.cs(527,47): error CS0117: 'File' does not contain a definition for 'ReadAllBytesAsync' [D:\a_work\1\s\test\Libraries\Microsoft.Extensions.AI.Abstractions.Tests\Microsoft.Extensions.AI.Abstractions.Tests.csproj::TargetFramework=net462] 0 Warning(s) 8 Error(s)

stephentoub avatar Dec 16 '25 21:12 stephentoub

@copilot there are build failures

Build FAILED. ...

Fixed in 0e791c4. Added InjectFilePolyfillsOnLegacy=true to the test project to include the File.ReadAllBytesAsync and File.WriteAllBytesAsync polyfills on net462.

Copilot avatar Dec 16 '25 21:12 Copilot

@copilot please address unresolved feedback

stephentoub avatar Dec 16 '25 21:12 stephentoub

@copilot please address unresolved feedback

Both unresolved feedback items were already addressed in c89fff7:

  1. Path traversal fix (DataContent.cs:363) - Now using Path.GetFileName(Name) to extract only the filename component when combining with directory path
  2. Empty string path test (DataContentTests.cs:590-614) - Added SaveToAsync_WithEmptyPath_UsesCurrentDirectory test that verifies empty string paths work correctly

Copilot avatar Dec 16 '25 21:12 Copilot

@copilot, are there any places in the repo where DataContent is constructed from a file or stream such that LoadFromAsync could now be used instead? Or similarly anywhere where it's written out to a file and SaveToAsync could be used instead?

stephentoub avatar Dec 17 '25 03:12 stephentoub

@copilot, are there any places in the repo where DataContent is constructed from a file or stream such that LoadFromAsync could now be used instead? Or similarly anywhere where it's written out to a f...

Yes! Found one excellent candidate in MarkItDownMcpReader.cs which was manually reading file/stream content and creating DataContent with media type inference - exactly what LoadFromAsync now does. Updated both ReadAsync(FileInfo) and ReadAsync(Stream) overloads to use DataContent.LoadFromAsync in the next commit.

No candidates found for SaveToAsync - the existing DataContent usages are for API communication (like image generation) rather than file persistence.

Copilot avatar Dec 17 '25 03:12 Copilot