elasticsearch-net icon indicating copy to clipboard operation
elasticsearch-net copied to clipboard

Attachment Deserialisation Exception

Open stevejgordon opened this issue 2 years ago • 3 comments

NEST/Elasticsearch.Net version: 7.17.2 Elasticsearch version: 8.x .NET runtime version: .NET 6.0 Operating system version: Windows 10

As original raised in the Discuss forum, there appears to be a deserialisation exception for documents which use the NEST Attachment type when using rest API compatibility with a v8 server.

Repro App

using Elasticsearch.Net;
using Nest;
using Test;

var doc = new IngestedAttachment
{
	Id = 1,
	Content = TestDocument.TestPdfDocument
};

var settings = new ConnectionSettings("CLOUDID",
	new BasicAuthenticationCredentials("elastic", "password"))
		.EnableDebugMode()
		.EnableApiVersioningHeader();

var client = new ElasticClient(settings);

if (client.Indices.Exists("ingest-testing").Exists)
{
	client.Indices.Delete("ingest-testing");
}

var createIndexResponse = client.Indices.Create("ingest-testing", c => c
	.Map<IngestedAttachment>(mm => mm
		.Properties(p => p
			.Text(s => s
				.Name(f => f.Content)
			)
			.Object<Attachment>(o => o
				.Name(f => f.Attachment)
			)
		)
	)
);

var pipelineResponse = client.Ingest.PutPipeline(new PutPipelineRequest("pdfdocs")
{
	Processors = new List<IProcessor>
	{
		new AttachmentProcessor
		{
			Field = "content",
			TargetField = "attachment"
		}
	}
});

var indexResponse = client.Index(doc, i => i.Index("ingest-testing").Pipeline("pdfdocs").Refresh(Refresh.True));

var getResponse = client.Get<IngestedAttachment>(indexResponse.Id, g => g.Index("ingest-testing"));

Console.ReadKey();

namespace Test
{
	public class IngestedAttachment
	{
		public Attachment? Attachment { get; set; }
		public string? Content { get; set; }
		public int Id { get; set; }
	}

	public class TestDocument
	{
		static TestDocument()
		{
            using var stream = File.OpenRead(@"C:\Attachment_Test_Document.pdf");
            using var memoryStream = new MemoryStream();
            stream.CopyTo(memoryStream);
            TestPdfDocument = Convert.ToBase64String(memoryStream.ToArray());
        }

		public static string TestPdfDocument { get; }
	}
}

Exception:

JsonParsingException: expected:',', actual:'"application/pdf; version=1.5"', at offset:349

v8.x server JSON

{
  "_index" : "ingest-testing",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "attachment" : {
      "date" : "2016-12-08T03:05:13Z",
      "keywords" : "nest,test,document",
      "content_type" : "application/pdf",
      "author" : "Russ Cam",
      "format" : "application/pdf; version=1.5",
      "modified" : "2016-12-08T03:05:13Z",
      "language" : "fr",
      "title" : "Attachment Test Document",
      "creator_tool" : "Microsoft® Word 2016",
      "content" : "Attachment Test Document  \n  \n\nA simple document to test NEST’s mapper-attachment support.",
      "content_length" : 96
    },
    "id" : 1,
    "content" : "..."
  }
}

v7.x server JSON

{
  "_index" : "ingest-testing",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "attachment" : {
      "date" : "2016-12-08T03:05:13Z",
      "keywords" : "nest,test,document",
      "content_type" : "application/pdf",
      "author" : "Russ Cam",
      "language" : "fr",
      "title" : "Attachment Test Document",
      "content" : "Attachment Test Document  \n  \n\nA simple document to test NEST’s mapper-attachment support.",
      "content_length" : 96
    },
    "id" : 1,
    "content" : "..."
  }
}

The most obvious difference which aligns with where the exception stems from is that the v8 response includes the format property. This needs some investigation to review if there is a fault within the formatter for this type or with the API response from the v8 server.

stevejgordon avatar Jun 09 '22 16:06 stevejgordon

Very likely a result of the upgrade to Tika 2.4 in v8 which extracts some additional fields. I'm not sure these can be addressed by the compatibility header and our formatter should probably be updated to support parsing the additional fields, when present.

stevejgordon avatar Jun 10 '22 06:06 stevejgordon

Still an issue - new project, latest versions of all libraries and server, fails to return search. Had to write a custom Attachment object with all the extended fields added to solve it.

chrissterling avatar Sep 12 '22 12:09 chrissterling

Running into a similar issue with our recent upgrade using compatibility with v8 server - are there any plans to support the Attachment class with the v8 .NET client?

jennifer122105 avatar Jun 01 '23 16:06 jennifer122105