hrequests icon indicating copy to clipboard operation
hrequests copied to clipboard

Lambda Execution Issues

Open puzzlepeaches opened this issue 1 year ago • 0 comments

Hey there! Awesome library! I am running into some issues. I hope the community here can help me troubleshoot them. I am attempting to run hrequests in Lambda to interact with specific web pages when a function URL is called.

I am using the AWS SDK to deploy a Docker container similar to the following to ECR -> Lambda:

FROM mcr.microsoft.com/playwright/python:v1.34.0-jammy

# Include global arg in this stage of the build
ARG FUNCTION_DIR

RUN mkdir -p ${FUNCTION_DIR}

COPY app.py ${FUNCTION_DIR}

WORKDIR /app

COPY ./mytool/pyproject.toml ./mytool/poetry.lock /app/

COPY ./mytool/. /app

# Install dependencies using poetry
RUN pip install --no-cache-dir poetry awslambdaric aws-xray-sdk sh \
    && poetry config virtualenvs.create false \
    && poetry install --no-interaction --no-ansi

RUN python -m playwright install-deps
RUN python -m playwright install

WORKDIR ${FUNCTION_DIR}

ENTRYPOINT [ "/usr/bin/python", "-m", "awslambdaric" ]
CMD [ "app.handler" ]

An app.py file similar to the following is then called using said function URL via awslambdaric:

def handler(event, context):
    logger.debug(msg=f"Initial event: {event}")

    headers = event["headers"]
    header_validation = validate_headers(headers)

    input = headers["x-input"]
    try:
        command = headers["x-command"].split()
        command.extend(input.split())
    except Exception as e:
        logger.error(msg=f"Error parsing command: {e}")
        return {
            "statusCode": 500,
            "body": f"Error parsing command: {e}",
        }

    parsed = []
    try:
        logger.debug(msg=f"Running command: {command}")

        # Set HOME=/tmp to avoid writing to the container filesystem
        # Set LD_LIBRARY_PATH to include /usr/lib64 to avoid issues with the AWS X-Ray daemon
        os.environ["HOME"] = "/tmp"
        os.environ["LD_LIBRARY_PATH"] = "/usr/lib64"

        results = subprocess.run(command, capture_output=True, text=True, env=os.environ.copy())
        logger.debug(msg=f"Results stdout: {results.stdout}")
        logger.debug(msg=f"Results stderr: {results.stderr}")
        logger.debug(msg=f"Command exited with code: {results.returncode}")

    except subprocess.TimeoutExpired as e:
        logger.error(msg=f"Command timed out: {e}")
        return {
            "statusCode": 408,  # HTTP status code for Request Timeout
            "body": json.dumps({
                "stdout": str(e.stdout),
                "stderr": str(e.stderr),
                "e": str(e),
                "error": "Command timed out"
            }),
        }
    except Exception as e:
        logger.error(msg=f"Error executing command: {e}")
        return {
            "statusCode": 500,
            "body": f"Error executing command: {e}",
        }

    try:
        for line in results.stdout.splitlines():
            parsed_json = json.loads(line)
            logger.debug(msg=f"Output: {parsed_json}")
            parsed.append(parsed_json)
    except Exception as e:
        logger.error(msg=f"Error parsing output: {e}")
        return {
            "statusCode": 500,
            "body": f"Error parsing output: {e}",
        }
    
    xray_recorder.end_segment()

    return {"statusCode": 200, "body": json.dumps(parsed)}

This app.py code is calling a separate tool I have created that utilizes hrequests for navigation and interaction with web pages. When calling the app.py file with the function URL, however, the following error is returned from hrequests specifically:

Exception in thread Thread-1 (spawn_main):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.10/dist-packages/hrequests/browser.py", line 128, in spawn_main
    asyncio.new_event_loop().run_until_complete(self.main())
  File "/usr/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.10/dist-packages/hrequests/browser.py", line 135, in main
    self.context = await self.client.new_context(
  File "/usr/local/lib/python3.10/dist-packages/hrequests/playwright_mock/playwright_mock.py", line 38, in new_context
    _browser = await context.new_context(
  File "/usr/local/lib/python3.10/dist-packages/hrequests/playwright_mock/context.py", line 6, in new_context
    context = await inst.main_browser.new_context(
  File "/usr/local/lib/python3.10/dist-packages/playwright/async_api/_generated.py", line 14154, in new_context
    await self._impl_obj.new_context(
  File "/usr/local/lib/python3.10/dist-packages/playwright/_impl/_browser.py", line 127, in new_context
    channel = await self._channel.send("newContext", params)
  File "/usr/local/lib/python3.10/dist-packages/playwright/_impl/_connection.py", line 61, in send
    return await self._connection.wrap_api_call(
  File "/usr/local/lib/python3.10/dist-packages/playwright/_impl/_connection.py", line 482, in wrap_api_call
    return await cb()
  File "/usr/local/lib/python3.10/dist-packages/playwright/_impl/_connection.py", line 97, in inner_send
    result = next(iter(done)).result()
playwright._impl._api_types.Error: Target page, context or browser has been closed

Some notes on what has already been attempted:

  • The container image runs just fine on my local system with similar resource allocations specified
  • I can call my tool remotely, and it appears to run partially before hitting this exception
  • I have increased memory allocation to the Lambda function several times without success.
  • My tool is always hitting the lambda timeout value set no matter how high so I suspect this error is occurring and locking the application entirely.

I am not experienced with playwright and headless browser usage, so any help would be greatly appreciated. I understand this is not directly related to hrequests, but I hope the community here is familiar enough with the frameworks to assist. Thanks!

puzzlepeaches avatar Feb 04 '24 13:02 puzzlepeaches