featurebase
featurebase copied to clipboard
How to generate file outside pilosa
I have 200T csv file. It is impossible to generate bitmap file in pilosa.Could I generate bitmap file outside pilosa by other engine such as spark/tez and load bitmap file into pilosa?
This is very important to use pilosa in production . Actually original data is very big and the bitmap file can not be generated by pilosa and must be produced by bigdata engine such as spark/mapreudce.
@sydt2014 sorry for the delay here — the short answer is that while this is possible, you'd probably have to do quite a bit of custom work to make it happen.
If you have a really huge amount of data, you might look into Molecula — we're building a lot of tooling and new capabilities around Pilosa to deal with large datasets and enterprise needs. If that doesn't look to be an option, I can give you some pointers on file format and such.
Here's a snippet from api.go
which discusses the import-roaring endpoint and our file format:
// ImportRoaring is a low level interface for importing data to Pilosa when
// extremely high throughput is desired. The data must be encoded in a
// particular way which may be unintuitive (discussed below). The data is merged
// with existing data.
//
// It takes as input a roaring bitmap which it uses as the data for the
// indicated index, field, and shard. The bitmap may be encoded according to the
// official roaring spec (https://github.com/RoaringBitmap/RoaringFormatSpec),
// or to the pilosa roaring spec which supports 64 bit integers
// (https://www.pilosa.com/docs/latest/architecture/#roaring-bitmap-storage-format).
//
// The data should be encoded the same way that Pilosa stores fragments
// internally. A bit "i" being set in the input bitmap indicates that the bit is
// set in Pilosa row "i/ShardWidth", and in column
// (shard*ShardWidth)+(i%ShardWidth). That is to say that "data" represents all
// of the rows in this shard of this field concatenated together in one long
// bitmap.
func (api *API) ImportRoaring(ctx context.Context, indexName, fieldName string, shard uint64, remote bool, req *ImportRoaringRequest)