fst icon indicating copy to clipboard operation
fst copied to clipboard

[Help] Possible to build a FST at compile time?

Open bee-san opened this issue 2 years ago • 4 comments

I have a problem which I think FST might solve:

  1. 40 million keys, no values
  2. All strings
  3. We know every string at compile time and can guarantee no new strings will be added.
  4. Around 40-50k reads per programs run.
  5. We want to do this as fast as possible on runtime, and can sacrifice memory for it.

Is it possible to build an FST at compile time to achieve (5)?

bee-san avatar Aug 02 '22 19:08 bee-san

Sure. Serialize it to a file and use something like once_cell to load it via include_bytes.

It won't help do anything faster at runtime though.

You might want a perfect hashmap instead. Not sure if 40 million is too big for that though.

BurntSushi avatar Aug 02 '22 21:08 BurntSushi

Sure. Serialize it to a file and use something like once_cell to load it via include_bytes.

It won't help do anything faster at runtime though.

You might want a perfect hashmap instead. Not sure if 40 million is too big for that though.

Thank you @BurntSushi for your response to this question and your work with FST.

I've been thinking about this as well. Why would you say it wouldn't help? Baking the bytes into the binary would at least help during runtime in avoiding loading the map at runtime?

jymchng avatar Jul 02 '23 06:07 jymchng

I'm just saying that building an fstv at compile time won't make queries faster. It's the same bytes. This is unlike a perfect hash map for example.

Obviously building an fst at compile time means you won't have to do it at runtime, and that may or may not be significant.

BurntSushi avatar Jul 02 '23 11:07 BurntSushi

@BurntSushi Thank you for your clear response! 🚀 Learnt alot from it.

jymchng avatar Jul 02 '23 11:07 jymchng