scrapyrt
scrapyrt copied to clipboard
ScrapyRT Port Unreachable in Kubernetes Docker Container Pod
I'm experiencing difficulties in accessing a ScrapyRT service running on specific ports within a Kubernetes pod. My setup includes a Kubernetes cluster with a pod running a Scrapy application, which uses ScrapyRT to listen for incoming requests on designated ports. These requests are intended to trigger spiders on the corresponding ports.
Despite correctly setting up a Kubernetes service and referencing the Scrapy pod in it, I'm unable to receive any incoming requests to the pod. My understanding is that in Kubernetes networking, a service should be created first, followed by the pod, allowing inter-pod communication and external access through the service. Is this correct?
Below are the relevant configurations: scrapy-pod Dockerfile:
# Use Ubuntu as the base image
FROM ubuntu:latest
# Avoid prompts from apt
ENV DEBIAN_FRONTEND=noninteractive
# # Update package repository and install Python, pip, and other utilities
RUN apt-get update && \
apt-get install -y curl software-properties-common iputils-ping net-tools dnsutils vim build-essential python3 python3-pip && \
rm -rf /var/lib/apt/lists/*
# Install nvm (Node Version Manager) - EXPRESS
ENV NVM_DIR /usr/local/nvm
ENV NODE_VERSION 16.20.1
RUN mkdir -p $NVM_DIR
RUN curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.1/install.sh | bash
# Install Node.js and npm - EXPRESS
RUN . "$NVM_DIR/nvm.sh" && nvm install $NODE_VERSION && nvm alias default $NODE_VERSION && nvm use default
# Add Node and npm to path so the commands are available - EXPRESS
ENV NODE_PATH $NVM_DIR/versions/node/v$NODE_VERSION/lib/node_modules
ENV PATH $NVM_DIR/versions/node/v$NODE_VERSION/bin:$PATH
# Install Yarn - EXPRESS
RUN npm install --global yarn
# Set the working directory in the container to /usr/src/app
WORKDIR /usr/src/app
# Copy the current directory contents into the container at /usr/src/app
COPY . .
# Install any needed packages specified in requirements.txt
RUN pip3 install --no-cache-dir -r requirements.txt
# Copy the start_services.sh script into the container
COPY start_services.sh /start_services.sh
# Make the script executable
RUN chmod +x /start_services.sh
# Install any needed packages specified in package.json using Yarn - EXPRESS
RUN yarn install
# Expose all the necessary ports
EXPOSE 14805 14807 12085 14806 13905 12080 14808 8000
# Define environment variable - EXPRESS
ENV NODE_ENV production
# Run the script when the container starts
CMD ["/start_services.sh"]
start_services.sh:
#!/bin/bash
# Start ScrapyRT instances on different ports
scrapyrt -p 14805 &
scrapyrt -p 14807 &
scrapyrt -p 12085 &
scrapyrt -p 14806 &
scrapyrt -p 13905 &
scrapyrt -p 12080 &
scrapyrt -p 14808 &
# Keep the container running since the ScrapyRT processes are in the background
tail -f /dev/null
service yaml file:
apiVersion: v1
kind: Service
metadata:
name: scrapy-service
spec:
selector:
app: scrapy-pod
ports:
- name: port-14805
protocol: TCP
port: 14805
targetPort: 14805
- name: port-14807
protocol: TCP
port: 14807
targetPort: 14807
- name: port-12085
protocol: TCP
port: 12085
targetPort: 12085
- name: port-14806
protocol: TCP
port: 14806
targetPort: 14806
- name: port-13905
protocol: TCP
port: 13905
targetPort: 13905
- name: port-12080
protocol: TCP
port: 12080
targetPort: 12080
- name: port-14808
protocol: TCP
port: 14808
targetPort: 14808
- name: port-8000
protocol: TCP
port: 8000
targetPort: 8000
type: ClusterIP
deployment yaml file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: scrapy-deployment
labels:
app: scrapy-pod
spec:
replicas: 1
selector:
matchLabels:
app: scrapy-pod
template:
metadata:
labels:
app: scrapy-pod
spec:
containers:
- name: scrapy-pod
image: mydockerhub/privaterepository-scrapy:latest
imagePullPolicy: Always
ports:
- containerPort: 14805
- containerPort: 14806
- containerPort: 14807
- containerPort: 12085
- containerPort: 13905
- containerPort: 12080
- containerPort: 8000
envFrom:
- secretRef:
name: scrapy-env-secret
- secretRef:
name: express-env-secret
imagePullSecrets:
- name: my-docker-credentials
scrapy-pod's logs in Powershell terminal:
> k logs scrapy-deployment-56b9d66858-p59gs -f
2024-01-09 21:53:27+0000 [-] Log opened.
2024-01-09 21:53:27+0000 [-] Log opened.
2024-01-09 21:53:27+0000 [-] Log opened.
2024-01-09 21:53:27+0000 [-] Log opened.
2024-01-09 21:53:27+0000 [-] Log opened.
2024-01-09 21:53:27+0000 [-] Log opened.
2024-01-09 21:53:27+0000 [-] Log opened.
2024-01-09 21:53:27+0000 [-] Site starting on 12080
2024-01-09 21:53:27+0000 [-] Site starting on 14808
2024-01-09 21:53:27+0000 [-] Site starting on 14805
2024-01-09 21:53:27+0000 [-] Starting factory <twisted.web.server.Site object at 0x7f4cbdf44d60>
2024-01-09 21:53:27+0000 [-] Starting factory <twisted.web.server.Site object at 0x7fef9b620a00>
2024-01-09 21:53:27+0000 [-] Site starting on 13905
2024-01-09 21:53:27+0000 [-] Running with reactor: AsyncioSelectorReactor.
2024-01-09 21:53:27+0000 [-] Site starting on 14807
2024-01-09 21:53:27+0000 [-] Starting factory <twisted.web.server.Site object at 0x7f0892ff4df0>
2024-01-09 21:53:27+0000 [-] Site starting on 14806
2024-01-09 21:53:27+0000 [-] Starting factory <twisted.web.server.Site object at 0x7f00d3b99000>
2024-01-09 21:53:27+0000 [-] Starting factory <twisted.web.server.Site object at 0x7fba9e321180>
2024-01-09 21:53:27+0000 [-] Running with reactor: AsyncioSelectorReactor.
2024-01-09 21:53:27+0000 [-] Starting factory <twisted.web.server.Site object at 0x7f1782514f10>
2024-01-09 21:53:27+0000 [-] Running with reactor: AsyncioSelectorReactor.
2024-01-09 21:53:27+0000 [-] Running with reactor: AsyncioSelectorReactor.
2024-01-09 21:53:27+0000 [-] Site starting on 12085
2024-01-09 21:53:27+0000 [-] Starting factory <twisted.web.server.Site object at 0x7fb2054cd060>
2024-01-09 21:53:27+0000 [-] Running with reactor: AsyncioSelectorReactor.
2024-01-09 21:53:27+0000 [-] Running with reactor: AsyncioSelectorReactor.
2024-01-09 21:53:27+0000 [-] Running with reactor: AsyncioSelectorReactor.
Issue: Despite these configurations, no requests seem to reach the Scrapy pod. Logs from kubectl logs show that ScrapyRT instances start successfully on the specified ports. However, when I send requests from a separate debug pod running a Python Jupyter Notebook, they succeed for other pods but not for the Scrapy pod.
Question: How can I successfully connect to the Scrapy pod? What might be preventing the requests from reaching it?
Any insights or suggestions would be greatly appreciated.