oneAPI-samples icon indicating copy to clipboard operation
oneAPI-samples copied to clipboard

DPC++FPGA DB example: dataset for FPGA target is not included

Open robert-mijakovic opened this issue 2 years ago • 4 comments

Summary

In the Reference Designs of DPC++FPGA, the DB design README.md file states that the FPGA binary should be started with: ./db.fpga --dbroot=../data/sf1 --test, however, the repository doesn't provide ../data/sf1.

Version

Tip of the repository and earlier releases.

Steps to reproduce

Build and execute the example from the repository using: ./db.fpga --dbroot=../data/sf1 --test

Observed behavior

On the other hand ../data/sf0.01 is provided. By documentation, this dataset should be used with the FPGA_emu binary. I tried to use that dataset but it throws an error:

LineItem table size has 60175 rows when it should have 6001215
Orders table size has 15000 rows when it should have 1500000
Parts table size has 2000 rows when it should have 200000
Supplier table size has 100 rows when it should have 10000
PartSupplier table size has 8000 rows when it should have 800000
ERROR: could not validate the scale factor of the parsed database files

Expected behavior

Either provide dataset sf1 or allow sf0.01 to work with the FPGA target binary.

robert-mijakovic avatar Mar 16 '22 16:03 robert-mijakovic

The files are intentionally not included since they are over 1GB in size and would really bloat the repo. From the DB README:

In the data/ directory, you will find database files for a scale factor of 0.01. These files were generated manually and can be used to verify the queries in emulation. However, these files are too small to showcase the true performance of the FPGA hardware.

To generate larger database files to run on the hardware, you can use TPC's dbgen tool. Instructions for downloading, building and running the dbgen tool can be found on the TPC-H website. Note that this reference design currently only supports databases with scale factors of 0.01 or 1.

A alternative approach would be to write a table generator in C++ and include it in the design itself -- which I may already have ready 👀

tyoungsc avatar Mar 17 '22 20:03 tyoungsc

@mdbtucker what do you think about my alternative approach? I don't think adding these files to the repo is acceptable; but writing a data generator in C++ is reasonable, but will require some effort.

tyoungsc avatar Mar 18 '22 14:03 tyoungsc

I think the data generator is the ideal solution, assuming it is minimal effort.

If it's a lot of effort, I think the current solution of pointing the user at an easily available data generator is fine. If that is the plan, then I think the README instructions for running the fpga code should have a note added to emphasize the fact that sf1 is NOT included in the repo and must be generated by the user as outlined here <internal link to README section quoted in your comment above>

mdbtucker avatar Mar 22 '22 21:03 mdbtucker

@yuguen-intel FYI. I removed the bug label in favour of enhancement.

tyoungsc avatar Apr 29 '22 13:04 tyoungsc

In order to create the data set you need to get this tool: https://github.com/electrum/tpch-dbgen Then you simply run “make” and the executable “dbgen” will be created. With this tool you can create a new scale factor 1 data set by doing “./dbgen -s 1”. It will create a bunch of .tbl files. You can then copy those into the subdirectory db/data/sf1/ and launch the ./db.fpga executable in it.

These instructions need to be added to the db sample README

yuguen-intel avatar Sep 06 '22 06:09 yuguen-intel

Implemented in https://github.com/oneapi-src/oneAPI-samples/pull/1080

yuguen-intel avatar Sep 26 '22 08:09 yuguen-intel