vowpal_wabbit icon indicating copy to clipboard operation
vowpal_wabbit copied to clipboard

Using flatbuffers to generate examples in Python is too slow

Open etiennekintzler opened this issue 3 years ago • 0 comments

@jackgerrits mentionned the idea of generating flatbuffers binary format from DFtoVW.

The first tests show that it is very slow : ~ 24 seconds to build 1 million features.

I am using the python interface. Also note that I don't build the complete example (with Namespace, Example, ExampleRoot) so it's really a lower bound on the time I'd take.

Reproducible example

Get the file https://raw.githubusercontent.com/VowpalWabbit/vowpal_wabbit/master/vowpalwabbit/parser/flatbuffer/schema/example.fbs and run the following command flatc ./example.fbs to generate the associated python package (VW).

Then :

import flatbuffers
import VW.parsers.flatbuffer.Feature as ft

def build_feature(builder, name, value):
    s = builder.CreateString(name)
    
    ft.Start(builder)
    ft.AddName(builder, s)
    ft.AddValue(builder, value)
    ft.AddHash(builder, 123)
    
    return ft.End(builder)


builder = flatbuffers.Builder(1024)

n = int(1e6)

for x in np.random.normal(size=n):
    build_feature(builder, "x", x)

etiennekintzler avatar Aug 21 '21 11:08 etiennekintzler