elasticsearch-net
elasticsearch-net copied to clipboard
Attachment Deserialisation Exception
NEST/Elasticsearch.Net version: 7.17.2 Elasticsearch version: 8.x .NET runtime version: .NET 6.0 Operating system version: Windows 10
As original raised in the Discuss forum, there appears to be a deserialisation exception for documents which use the NEST Attachment
type when using rest API compatibility with a v8 server.
Repro App
using Elasticsearch.Net;
using Nest;
using Test;
var doc = new IngestedAttachment
{
Id = 1,
Content = TestDocument.TestPdfDocument
};
var settings = new ConnectionSettings("CLOUDID",
new BasicAuthenticationCredentials("elastic", "password"))
.EnableDebugMode()
.EnableApiVersioningHeader();
var client = new ElasticClient(settings);
if (client.Indices.Exists("ingest-testing").Exists)
{
client.Indices.Delete("ingest-testing");
}
var createIndexResponse = client.Indices.Create("ingest-testing", c => c
.Map<IngestedAttachment>(mm => mm
.Properties(p => p
.Text(s => s
.Name(f => f.Content)
)
.Object<Attachment>(o => o
.Name(f => f.Attachment)
)
)
)
);
var pipelineResponse = client.Ingest.PutPipeline(new PutPipelineRequest("pdfdocs")
{
Processors = new List<IProcessor>
{
new AttachmentProcessor
{
Field = "content",
TargetField = "attachment"
}
}
});
var indexResponse = client.Index(doc, i => i.Index("ingest-testing").Pipeline("pdfdocs").Refresh(Refresh.True));
var getResponse = client.Get<IngestedAttachment>(indexResponse.Id, g => g.Index("ingest-testing"));
Console.ReadKey();
namespace Test
{
public class IngestedAttachment
{
public Attachment? Attachment { get; set; }
public string? Content { get; set; }
public int Id { get; set; }
}
public class TestDocument
{
static TestDocument()
{
using var stream = File.OpenRead(@"C:\Attachment_Test_Document.pdf");
using var memoryStream = new MemoryStream();
stream.CopyTo(memoryStream);
TestPdfDocument = Convert.ToBase64String(memoryStream.ToArray());
}
public static string TestPdfDocument { get; }
}
}
Exception:
JsonParsingException: expected:',', actual:'"application/pdf; version=1.5"', at offset:349
v8.x server JSON
{
"_index" : "ingest-testing",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"_seq_no" : 0,
"_primary_term" : 1,
"found" : true,
"_source" : {
"attachment" : {
"date" : "2016-12-08T03:05:13Z",
"keywords" : "nest,test,document",
"content_type" : "application/pdf",
"author" : "Russ Cam",
"format" : "application/pdf; version=1.5",
"modified" : "2016-12-08T03:05:13Z",
"language" : "fr",
"title" : "Attachment Test Document",
"creator_tool" : "Microsoft® Word 2016",
"content" : "Attachment Test Document \n \n\nA simple document to test NEST’s mapper-attachment support.",
"content_length" : 96
},
"id" : 1,
"content" : "..."
}
}
v7.x server JSON
{
"_index" : "ingest-testing",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"_seq_no" : 0,
"_primary_term" : 1,
"found" : true,
"_source" : {
"attachment" : {
"date" : "2016-12-08T03:05:13Z",
"keywords" : "nest,test,document",
"content_type" : "application/pdf",
"author" : "Russ Cam",
"language" : "fr",
"title" : "Attachment Test Document",
"content" : "Attachment Test Document \n \n\nA simple document to test NEST’s mapper-attachment support.",
"content_length" : 96
},
"id" : 1,
"content" : "..."
}
}
The most obvious difference which aligns with where the exception stems from is that the v8 response includes the format property. This needs some investigation to review if there is a fault within the formatter for this type or with the API response from the v8 server.
Very likely a result of the upgrade to Tika 2.4 in v8 which extracts some additional fields. I'm not sure these can be addressed by the compatibility header and our formatter should probably be updated to support parsing the additional fields, when present.
Still an issue - new project, latest versions of all libraries and server, fails to return search. Had to write a custom Attachment object with all the extended fields added to solve it.
Running into a similar issue with our recent upgrade using compatibility with v8 server - are there any plans to support the Attachment class with the v8 .NET client?