Roboquant-avro should allow to use non-Price items
Hello, and thank you for all the great work done for this project.
I'm currently using it for building a strategy based also on news sentiment. I gathered very large datasets of csv-based news, and i computed a finbert sentiment on each csv line new.
As it is not a price-based CSV type, it was not possible to use the existing CSVFeed which is made mainly for prices (despite the name). So i created my own Feed and it worked like a charm, but was a bit too clunky on each backtest startup.
So i decided to convert it to an avro File, by following the related documentation.
But it didn't work, despite the fact that the constructor of AvroFeed accepted any feed implementing the Feed interface.
By browsing the code, i saw that both of the record( ) and the play( ) methods rely on the fact that the Feed is a feed of PriceItem. So even with the other Item classes defined in roboquant.core, it is not possible to convert feed out of the box.
` val event = channel.receive()
for (item in event.items.filterIsInstance<PriceItem>()) {
if (!assetFilter.filter(item.asset, event.time)) continue
record.put(0, event.time.toString())
record.put(1, item.asset.serialize())
val serialization = serializer.serialize(item)
val t = GenericData.EnumSymbol(enumSchema, serialization.type)
record.put(2, t)
val arr = GenericData.Array<Double>(serialization.values.size, arraySchema)
arr.addAll(serialization.values)
record.put(3, arr)
record.put(4, serialization.meta)
dataFileWriter.append(record)
count++
}
}`
For exemple, in this part of code, the AvroFeed should also handle NewsItem which is defined by roboquant.core.
In my case i couln't use any of those classes in my use case because :
- CSVFeed can't deal with non price data
- AvroFeed can't deal with non PriceItem
- NewsItem defined not enough field to be effectively used, because the Map<String,Any> used for metadata is not efficient to store the sentiment score, the summary, etc...
Maybe a great feature could be to allow user to register easily its own schema, and Item types ?
I can work on any part of this request as a PR, if you think it can be profitable. Thanks,
Alexandre
Indeed the feed only works on price items since for those only converters are included that all meet the common Avro schema.
Storing other types would likely also require a different schema and different parsing logic. Would be nice to have, but I guess very specific to a single item type?
You're right, but since news-based strategy is quite common maybe it could be convenient to work at least on the more generic NewsItem with classic fields such as headline, summary, content, author, time, score, and work on a parser for this specific item?
I added NewItem to illustrate that items can be things other than prices. Didn't put too much thought into it other than using the content for example for sentiment analysis.
So I would welcome any improvements.