v12 icon indicating copy to clipboard operation
v12 copied to clipboard

Training hangs and almost never reaches 100% (reached one or twice)

Open boyoma opened this issue 2 years ago • 3 comments

Describe the bug My installation looks, fine (web embedding on another domain too) but every time I press train chabot percentage will raise slowly and eventually it will stop before reaching 100%, and I will need to press the button again. It got completed maybe once out of 100 times. More often than not it stuck at 0%.

The exact same bot was first produced in localhost and train there and it was working very fine. slow but it completes.

I'm using a 4 GB Memory / 80 GB Disk / Ubuntu 20.04 (LTS) x64

To Reproduce Steps to reproduce the behavior:

  1. Go to 'a bot'
  2. Click on 'train chatbot'
  3. See error 'almost never reach 100%'

Expected behavior Being slow it is ok but at least it should complete

Environment (please complete the following information):

  • OS: linux
  • Browser chrome
  • Browser Version 111.0.5563.64
  • Botpress Version 12.30.7

boyoma avatar Mar 16 '23 15:03 boyoma

Seems like I have the same problem, I installed on my local PC using the following docker-comopse.yml to test:

version: '3'

services:
  botpress:
    image: botpress/server
    expose:
      - 3000
    ports:
      - "3000:3000"
    environment:
      - DATABASE_URL=postgres://postgres:secretpw@postgres:5435/botpress_db
    depends_on:
      - postgres
    volumes:
      - ./build/botpress/data:/botpress/data

  postgres:
    image: postgres:11.2-alpine
    expose:
      - 5435
    environment:
      PGPORT: 5435
      POSTGRES_DB: botpress_db
      POSTGRES_PASSWORD: secretpw
      POSTGRES_USER: postgres
    volumes:
      - pgdata:/var/lib/postgresql/data

volumes:
  pgdata:

This is all I can see in the logs:

botpress_1 | 04/17/2023 00:17:21.386 [NLU] training-queue [test01/87443a324c26b933.b7f4a95061d75566.3265.en] Training Queued.
botpress_1 | 04/17/2023 00:17:21.692 [NLU] Engine:training Training worker successfully started on process with pid 180.

I am using to test a new bot called test01 from the Small Talk template and without making changes.

I have not been able to complete any training, the maximum that I have been able to reach is 80%

cccaballero avatar Apr 17 '23 00:04 cccaballero

How much memory is accessible to your Botpress containers? Usually, when the training stops between 80 and 99% it's because the training process was killed by the OS because it was using too much memory.

Make sure your Botpress node has access to at least 3GB of ram.

Thanks,

sebburon avatar Apr 20 '23 18:04 sebburon

@sebburon I don't think it's a memory problem, I don't have any limits defined for the docker container, and I have plenty of ram. This is what docker stats tells me:

MEM USAGE / LIMIT
573.6MiB / 38.88GiB

cccaballero avatar Apr 24 '23 03:04 cccaballero