qlever-control icon indicating copy to clipboard operation
qlever-control copied to clipboard

IndexBuilderMain crashes due to incorrect --stxxl-memory parameter format

Open arcangelo7 opened this issue 9 months ago • 6 comments

Issue: IndexBuilderMain crashes due to incorrect --stxxl-memory parameter format

Description

When running the qlever index command using the qlever_control framework, I encountered an issue where the indexing process crashes without providing a clear error message. After detailed logging and investigation, I discovered that the problem is caused by the --stxxl-memory parameter format. If the memory value is specified with the "G" suffix (e.g., 5G), the process crashes. The parameter should be provided as a plain number without the "G" (e.g., 5).

Steps to Reproduce

  1. Set up the QLever environment and prepare the QLeverfile with the following configuration:

    [data]
    NAME              = oc_meta
    BASE_URL          = https://w3id.org/oc/meta
    DESCRIPTION       = OpenCitations Meta stores and delivers bibliographic metadata for all publications involved in the OpenCitations Index.
    TEXT_DESCRIPTION  = All literals, search with FILTER CONTAINS(?var, "...")
    
    [index]
    INPUT_FILES = qlever_input_openalex/*
    CAT_INPUT_FILES = find qlever_input_openalex/ -type f | xargs cat
    SETTINGS_JSON   = { "ascii-prefixes-only": false, "num-triples-per-batch": 100000 }
    TEXT_INDEX = from_literals
    
    [server]
    PORT               = 7006
    MEMORY_FOR_QUERIES = 5G
    CACHE_MAX_SIZE     = 2G
    TIMEOUT            = 30s
    
    [runtime]
    SYSTEM = docker
    IMAGE  = docker.io/adfreiburg/qlever:latest
    
    [ui]
    UI_CONFIG = oc_meta
    
  2. Run the qlever index command.

Expected Behavior

The indexing process should complete successfully without crashing.

Actual Behavior

The process crashes, and the following error message is logged:

Error in command-line argument: bad lexical cast: source type value could not be interpreted as target
Options for IndexBuilderMain:
...
-m [ --stxxl-memory-gb ] arg The amount of memory in GB to use for
sorting during the index build.
Decrease if the index builder runs out
of memory.
...

Investigation and Findings

Detailed logging revealed that the --stxxl-memory parameter should be provided without the "G" suffix. The following command works as expected:

IndexBuilderMain -F ttl -f - -i oc_meta -s oc_meta.settings.json --text-words-from-literals --stxxl-memory 5 | tee oc_meta.index-log.txt

Suggested Fix

Modify the QLeverfile configuration or the script to remove the "G" suffix from the --stxxl-memory parameter. Ensure that the value is passed as a plain number.

index_cmd = f"{args.cat_input_files} | {args.index_binary} -F ttl -f - -i {args.name} -s {args.name}.settings.json --text-words-from-literals --stxxl-memory {args.stxxl_memory.replace('G', '')} | tee {args.name}.index-log.txt"

Alternatively, update the provided version of the QLeverfile to include the STXXL_MEMORY parameter without the "G" suffix, or handle this within the code to prevent similar issues for other users.

Environment

  • QLever Control Version: Latest
  • Operating System: Debian 12

arcangelo7 avatar May 21 '24 13:05 arcangelo7