Long import of +300k records
Description
I have a JSON file of 300,000+ records each containing:
First name(plaintext)Last Name(plaintext)Address(plaintext)Visitor ID(num)Region(Category via Title string)Preference ID(num)
Things I have done:
- devMode is off.
- /config/feed-me.php
<?php
return [
'*' => [
'compareContent' => true,
]
];
- Type is
Entry - I have
Add EntryandUpdate Entryusing theVisitor IDas the unique identifier as there are a few thousand records in the system already. - Hardware wise, it's a DigitalOcean droplet with ample CPU and memory running 20 other CraftCMS sites on it.
htopshows the CPU (4 cores) is barely being used (cores bouncing between 1-10% each) and memory is just under half used.
Status
- I have the import running right now I've made about 2% in progress over the last 2 hours.
Questions:
- Is this "normal" for this amount/type of data? I tried CSV before and it was about the same.
- If it's not normal, what do you suggest I try to speed this import up?
Additional info
- Craft version: Craft Pro 5.4.3
- PHP version: 8.3.11
- Database driver & version: MySQL 8.0.30
- Plugins & versions:
AsyncQueue 4.0.0 Button Box 5.0.0 Campaign dev-develop CKEditor 4.2.0 Code Field 5.0.0 Control Panel CSS 3.0.0 Cookies 5.0.0 Dashboard Begone 3.0.0 Expanded Singles 3.0.0 Feed Me 6.3.0 Field Manager 4.0.2 Formie 3.0.6 Icon Picker 3.0.1 Knock Knock 3.0.1 Neo 5.2.5 SendGrid 3.0.0 SEO 5.1.3 Single Cat 4.0.0 Sprig 3.5.0 Timber 2.0.2
Just to update this — I split my list into 20k chunks, and it's taking at least a day to import each chuck. Often timing out so I have to extend the timeout to 20 hours to try and process the chunks successfully.
if anyone had any suggestions on how to go about this better/quicker, it would be super appreciated.
- I have a Digital Ocean droplet with 32GB of ram and 4 vCPUs. The CPU usage never really gets past 6% in total.
- I'm running
php-fpmwith pool settings that are plenty to satisfy the traffic on the server. - Im using Digital Ocean hosted database, and also upgraded it to 16 GB RAM / 6vCPUs
None of these actions seem to be making any kind if noticeable improvement on the import speed. I need to get this data in, and ultimately more data into the system, but doing it over days and weeks is not sustainable.
Is there anything else I can try?
Did you found a way to solve this problem? Is really difficult to syncronize a big amount of data