Etl.Net
Etl.Net copied to clipboard
Getting in-memory list from the database stream
Hello,
is there a way to get an in memory list as an instantiated IList<T>
from SQL Server stream and have the database cursor consumed/closed?
First I thought extension method ToList()
is for this purpose, but it appears it returns an ISingleStream<List<T>>
var inMemoryCarrierList = contextStream
.CrossApplySqlServerQuery("carrier stream", builder => builder
.FromQuery("select carr_code, carr_name from dbo.carr_old order by carr_code")
.WithMapping(m => new
{
carr_code = m.ToColumn("carr_code"),
carr_name = m.ToColumn("carr_name")
}), "source1")
.ToList("in memory carrier list");
If I get a list this way and I don't consume this stream by, say, outputting to a text file and then I try a query like this on the same ODBC connection:
var arch1 = contextStream
.CrossApplySqlServerQuery("first query from source 1", o => o
.FromQuery("select * from dbo.carr")
.WithMapping(i => new
{
carr_code = i.ToColumn("carr_code"),
carr_name = i.ToColumn("carr_name")
})
, "source1")
.Select("create row to save source1 from first query", i => new { i.carr_name, i.carr_code });
I get error:
Unhandled exception. Paillave.Etl.Core.JobExecutionException: Job execution failed
---> System.Data.Odbc.OdbcException (0x80131937): ERROR [HY000] [Microsoft][ODBC SQL Server Driver]Connection is busy with results for another hstmt
@Radamonas found the way as a workaround for the issue related to ODBC drivers. It works with a client cursor, witch is the recommended way to avoid memory overload: https://github.com/paillave/Etl.Net/issues/423#issuecomment-1477397756
It does work in provided example, the thing is that it drains both stream cursors to files. What makes me wonder is whether it is possible to have definitions of several streams in the same process without actually outputting them to a file, so that they could be used for combining streams, e.g. look-ups, etc. later on?
It does work in provided example, the thing is that it drains both stream cursors to files. What makes me wonder is whether it is possible to have definitions of several streams in the same process without actually outputting them to a file, so that they could be used for combining streams, e.g. look-ups, etc. later on?
yes, you can do what ever you want from what you issue from any operators, and that counts as well for CrossApply
operators.
As maybe there is something I didn't get in your question, give me a sample of the kind of code you would like to achieve, and I will adapt it to work with existing operators.
What I was referring to with this question is actually this statement and example in the documentation:
To make a lookup, extensions for Sql Server don't provide any operator out of the box. The work around
is to use the in memory lookup of ETL.NET core.
var authorStream = contextStream
.CrossApplySqlServerQuery("get authors", o => o
.FromQuery("select a.* from dbo.Author as a")
.WithMapping(i => new
{
Id = i.ToNumberColumn<int>("Id"),
Name = i.ToColumn("Name"),
Reputation = i.ToNumberColumn<int>("Reputation")
}));
postStream
.Lookup("get related author", authorStream,
l => l.AuthorId,
r => r.Id,
(l, r) => new { Post = l, Author = r })
.Do("show value on console", i => Console.WriteLine($"{i.Post.Title} ({i.Author.Name})"));
The question is: how to you define the postStream
here? We're calling a method on postStream
, but how do we define it?
you just have to define your postStream exactly in the same way you defined authorStream, and combine them the way you want.
Maybe this near real life example will help you to understand: https://github.com/paillave/Etl.Net/blob/master/src/Tutorials/Paillave.Etl.Samples/TestImport2.cs FYI, in this example, the target data is this one: https://github.com/paillave/Etl.Net/blob/master/src/Tutorials/Paillave.Etl.Samples/DataAccess/DbStructure.md and the input files are here: https://github.com/paillave/Etl.Net/tree/master/src/Tutorials/Paillave.Etl.Samples/InputFiles There are 2 kind of files: files that define portfolios, and files that define positions
Sorry to bother you again, but the examples deal with files, and I'm referring to the SQL Server data source and this note in documentation: "To make a lookup, extensions for Sql Server don't provide any operator out of the box. The work around is to use the in memory lookup of ETL.NET core."
How do you actually accomplish this workaround and perform in memory look up if you have two database streams? Thank you
yes but in the same way you produce data from a file, you produce data from an in memory stream.
Does the following example represent what you expected?
var categoriesStream = contextStream.CrossApply("create in memory categories", ctx => new[] {
new { Code = "a", Label = "label A", Tax = 0.1 },
new { Code = "b", Label = "label B", Tax = 0.2 },
new { Code = "c", Label = "label C", Tax = 0.3 },
});
contextStream
.CrossApplyFolderFiles("list all required files", "*.zip", true).CrossApplyZipFiles("extract files from zip", "*.csv") // could be retrieved from database... anything...
.CrossApplyTextFile("parse file", FlatFileDefinition.Create(i => new Article
{
Name = i.ToColumn("name"),
Category = i.ToColumn("first name"),
Price = i.ToNumberColumn<double?>("price", ".")
}).IsColumnSeparated(','))
.Lookup("get category label", categoriesStream, l => l.Category, r => r.Code, (l, r) => new
{
Name = l.Name,
CategoryLabel = r.Label,
PriceWithTax = l.Price * r.Tax,
})
[...];
It is clearer now, thank you. The one thing I still can't understand is how to declare an in-memory stream from the SQL Database data, because this is, as I understand, currently the only way to combine two database streams.
It is clearer now, thank you. The one thing I still can't understand is how to declare an in-memory stream from the SQL Database data, because this is, as I understand, currently the only way to combine two database streams.
I still don't understand your point. In the previous example, categoriesStream
is fully in-memory, and it is combined with something that comes with file (could be database or anything)
FYI, here is the documentation that shows how to combine streams: https://paillave.github.io/Etl.Net/docs/recipes/linkStreams
Yes, categoriesStream is in-memory, but it is initialized by using array initializer at compile time. My question is how do you populate an in-memory stream from a database table? Documentation says you cannot combine two database streams unless one of the streams is an in memory stream. Is there actually no way to combine to database streams?
If you want the join to be done by the database engine, your only choice is to make the proper SQL query with the right join statement. If you want to the join/lookup to be made at ETL side, but you want to combine dataset that come from two different databases (for example) then you do the following:
But I still have the feeling this is not what you expect, so I'm a bit lost about what puzzles you.
var categoriesStream = contextStream.CrossApplySqlServerQuery("get categories", o => o
.FromQuery("select * from dbo.ProductCategory")
.WithMapping(i => new
{
Code = i.ToColumn("Code"),
Label = i.ToColumn("Label"),
Tax = i.ToNumberColumn<decimal>("Tax")
}));
contextStream
.CrossApplyFolderFiles("list all required files", "*.zip", true).CrossApplyZipFiles("extract files from zip", "*.csv") // could be retrieved from database... anything...
.CrossApplyTextFile("parse file", FlatFileDefinition.Create(i => new Article
{
Name = i.ToColumn("name"),
Category = i.ToColumn("first name"),
Price = i.ToNumberColumn<double?>("price", ".")
}).IsColumnSeparated(','))
.Lookup("get category label", categoriesStream, l => l.Category, r => r.Code, (l, r) => new
{
Name = l.Name,
CategoryLabel = r.Label,
PriceWithTax = l.Price * r.Tax,
})
[...];
This sentence made everything clear,
If you want the join to be done by the database engine, your only choice is to make the proper SQL query with the right join statement.
Thank you, @paillave, and sorry for bombarding you with questions ;)
leave it opened for documentation purpose till the answers are on the documentation website