oneAPI-samples
oneAPI-samples copied to clipboard
DPC++FPGA DB example: dataset for FPGA target is not included
Summary
In the Reference Designs of DPC++FPGA, the DB design README.md file states that the FPGA binary should be started with: ./db.fpga --dbroot=../data/sf1 --test
, however, the repository doesn't provide ../data/sf1
.
Version
Tip of the repository and earlier releases.
Steps to reproduce
Build and execute the example from the repository using:
./db.fpga --dbroot=../data/sf1 --test
Observed behavior
On the other hand ../data/sf0.01
is provided. By documentation, this dataset should be used with the FPGA_emu binary.
I tried to use that dataset but it throws an error:
LineItem table size has 60175 rows when it should have 6001215
Orders table size has 15000 rows when it should have 1500000
Parts table size has 2000 rows when it should have 200000
Supplier table size has 100 rows when it should have 10000
PartSupplier table size has 8000 rows when it should have 800000
ERROR: could not validate the scale factor of the parsed database files
Expected behavior
Either provide dataset sf1 or allow sf0.01 to work with the FPGA target binary.
The files are intentionally not included since they are over 1GB in size and would really bloat the repo. From the DB README:
In the data/ directory, you will find database files for a scale factor of 0.01. These files were generated manually and can be used to verify the queries in emulation. However, these files are too small to showcase the true performance of the FPGA hardware.
To generate larger database files to run on the hardware, you can use TPC's dbgen tool. Instructions for downloading, building and running the dbgen tool can be found on the TPC-H website. Note that this reference design currently only supports databases with scale factors of 0.01 or 1.
A alternative approach would be to write a table generator in C++ and include it in the design itself -- which I may already have ready 👀
@mdbtucker what do you think about my alternative approach? I don't think adding these files to the repo is acceptable; but writing a data generator in C++ is reasonable, but will require some effort.
I think the data generator is the ideal solution, assuming it is minimal effort.
If it's a lot of effort, I think the current solution of pointing the user at an easily available data generator is fine. If that is the plan, then I think the README instructions for running the fpga code should have a note added to emphasize the fact that sf1 is NOT included in the repo and must be generated by the user as outlined here <internal link to README section quoted in your comment above>
@yuguen-intel FYI. I removed the bug label in favour of enhancement.
In order to create the data set you need to get this tool: https://github.com/electrum/tpch-dbgen Then you simply run “make” and the executable “dbgen” will be created. With this tool you can create a new scale factor 1 data set by doing “./dbgen -s 1”. It will create a bunch of .tbl files. You can then copy those into the subdirectory db/data/sf1/ and launch the ./db.fpga executable in it.
These instructions need to be added to the db sample README
Implemented in https://github.com/oneapi-src/oneAPI-samples/pull/1080