vowpal_wabbit
vowpal_wabbit copied to clipboard
Using flatbuffers to generate examples in Python is too slow
@jackgerrits mentionned the idea of generating flatbuffers binary format from DFtoVW
.
The first tests show that it is very slow : ~ 24 seconds to build 1 million features.
I am using the python interface. Also note that I don't build the complete example (with Namespace, Example, ExampleRoot) so it's really a lower bound on the time I'd take.
Reproducible example
Get the file https://raw.githubusercontent.com/VowpalWabbit/vowpal_wabbit/master/vowpalwabbit/parser/flatbuffer/schema/example.fbs and run the following command flatc ./example.fbs
to generate the associated python package (VW
).
Then :
import flatbuffers
import VW.parsers.flatbuffer.Feature as ft
def build_feature(builder, name, value):
s = builder.CreateString(name)
ft.Start(builder)
ft.AddName(builder, s)
ft.AddValue(builder, value)
ft.AddHash(builder, 123)
return ft.End(builder)
builder = flatbuffers.Builder(1024)
n = int(1e6)
for x in np.random.normal(size=n):
build_feature(builder, "x", x)