feed-me icon indicating copy to clipboard operation
feed-me copied to clipboard

Long import of +300k records

Open adrianjean opened this issue 1 year ago • 3 comments

Description

I have a JSON file of 300,000+ records each containing:

  • First name (plaintext)
  • Last Name (plaintext)
  • Address (plaintext)
  • Visitor ID (num)
  • Region (Category via Title string)
  • Preference ID (num)

Things I have done:

  • devMode is off.
  • /config/feed-me.php
<?php
return [
    '*' => [
        'compareContent' => true,
    ]
];
  • Type is Entry
  • I have Add Entry and Update Entry using the Visitor ID as the unique identifier as there are a few thousand records in the system already.
  • Hardware wise, it's a DigitalOcean droplet with ample CPU and memory running 20 other CraftCMS sites on it.
  • htop shows the CPU (4 cores) is barely being used (cores bouncing between 1-10% each) and memory is just under half used.

Status

  • I have the import running right now I've made about 2% in progress over the last 2 hours.

Questions:

  • Is this "normal" for this amount/type of data? I tried CSV before and it was about the same.
  • If it's not normal, what do you suggest I try to speed this import up?

Additional info

  • Craft version: Craft Pro 5.4.3
  • PHP version: 8.3.11
  • Database driver & version: MySQL 8.0.30
  • Plugins & versions:

AsyncQueue 4.0.0 Button Box 5.0.0 Campaign dev-develop CKEditor 4.2.0 Code Field 5.0.0 Control Panel CSS 3.0.0 Cookies 5.0.0 Dashboard Begone 3.0.0 Expanded Singles 3.0.0 Feed Me 6.3.0 Field Manager 4.0.2 Formie 3.0.6 Icon Picker 3.0.1 Knock Knock 3.0.1 Neo 5.2.5 SendGrid 3.0.0 SEO 5.1.3 Single Cat 4.0.0 Sprig 3.5.0 Timber 2.0.2

adrianjean avatar Sep 14 '24 16:09 adrianjean

Just to update this — I split my list into 20k chunks, and it's taking at least a day to import each chuck. Often timing out so I have to extend the timeout to 20 hours to try and process the chunks successfully.

if anyone had any suggestions on how to go about this better/quicker, it would be super appreciated.

adrianjean avatar Sep 24 '24 17:09 adrianjean

  • I have a Digital Ocean droplet with 32GB of ram and 4 vCPUs. The CPU usage never really gets past 6% in total.
  • I'm running php-fpm with pool settings that are plenty to satisfy the traffic on the server.
  • Im using Digital Ocean hosted database, and also upgraded it to 16 GB RAM / 6vCPUs

None of these actions seem to be making any kind if noticeable improvement on the import speed. I need to get this data in, and ultimately more data into the system, but doing it over days and weeks is not sustainable.

Is there anything else I can try?

adrianjean avatar Sep 28 '24 04:09 adrianjean

Did you found a way to solve this problem? Is really difficult to syncronize a big amount of data

plcdnl avatar Mar 30 '25 08:03 plcdnl