quickwit icon indicating copy to clipboard operation
quickwit copied to clipboard

Lambda does not exit on errors, keeps running until timeout

Open OperationalFallacy opened this issue 1 year ago • 4 comments

Describe the bug

I haven't tested it super thoroughly, but I saw Lambda runs till max timeout even thought it couldn't process anything. One example was sending gz files which I realized not supported

Steps to reproduce (if applicable) Steps to reproduce the behavior:

  1. deploy indexer lambda
  2. send compressed file, test.json.gz

Expected behavior

It should exit immediately and not run until timeout

Configuration:

export class RustFunction extends Function {
  constructor(scope: Construct, id: string, props?: Partial<FunctionProps>) {
    const lambdaAssetPath = path.join(
      __dirname,
      "../../quickwit/quickwit",
      "quickwit-lambda/deploy",
      id
    );
    console.log("lambdaAssetPath", lambdaAssetPath);
    super(scope, id+'_f', {
      ...props,
      code: Code.fromAsset(lambdaAssetPath),
      handler: id, // use id to specify either "indexer" or "searcher"
      runtime: Runtime.PROVIDED_AL2,
      architecture: Architecture.ARM_64,
      logRetention: RetentionDays.ONE_DAY,
      tracing: Tracing.DISABLED,
    });
  }
}

Binaries compiled with this command (I didn't use cross it was too slow on Mac)

LIBZ_SYS_STATIC=1 TARGET_CC=aarch64-linux-musl-gcc RUSTFLAGS="-C linker=aarch64-linux-musl-gcc -C link-arg=-static -C opt-level=z -C lto" cargo build --release --target aarch64-unknown-linux-musl

  1. Output of quickwit --version

checked out v0.8.1

  1. The index_config.yaml

version: 0.7

index_id: test-index

doc_mapping:
  field_mappings:
    - name: website
      type: text
    - name: name
      type: text
    - name: founded
      type: i64
      indexed: true
      fast: true
    - name: size
      type: text
    - name: locality
      type: text
    - name: region
      type: text
      fast: true
    - name: country
      type: text
      fast: true
    - name: industry
      type: text
      fast: true
    - name: linkedin_url
      type: text

search_settings:
  default_search_fields: [name]

indexing_settings:
  split_num_docs_target: 2000000

OperationalFallacy avatar May 21 '24 05:05 OperationalFallacy

Hi @OperationalFallacy, thanks for reporting this.

It would be a useful improvement indeed, even though it is far from trivial to identify all un-recoverable errors and bubble them up to stop the lambda.

One example was sending gz files which I realized not supported

gz files are supported, if you run Quickwit Lambda with these packages it should work just fine.

In general, we don't guaranty that main or the Quickwit release tags (e.g 0.8.1) builds to a functioning AWS Lambda. The release cycle of Quickwit Lambda is still independent at this stage. You should use the Lambda release tags instead (aws-lambda-beta-xx).

rdettai avatar May 21 '24 13:05 rdettai

Agree, won't be easy. I saw it runs server-side software. Pretty remarkable you made it work in Lambda, and so well.

Is there an option to make all errors unrecoverable? If users prefer Lambda can use own retry mechanism. I recognize, though - this is probably not how quickwit servers designed :)

Thanks for pointing to the tags, I'll try it!

OperationalFallacy avatar May 21 '24 13:05 OperationalFallacy

Is there an option to make all errors unrecoverable?

There is currently a retry mechanism at the indexing pipeline level that could be disabled. That wouldn't be a silver bullet solution but might already avoid hanging in many failure cases.

rdettai avatar May 28 '24 07:05 rdettai

That should help. Is there a configuration option to disable it?

OperationalFallacy avatar May 28 '24 10:05 OperationalFallacy