seed Error importing large file, 177,000 records

instance: staging V 2.2.0 release

This is a very large file (177,000 records) but not many fields. PNNL is trying to import it to SEED as part of their UBID work. It would be nice if we could figure out whether SEED can import it or not. Right now on staging the program gets to about 10% and then gives an error.

See this folder for the file. https://drive.google.com/open?id=1e5-c5EpB4YkUlX442DD64IJpOCW-1No0

@nllong -- could I try importing this file onto dev1? Or do you want to?

Nov 09 '17 02:11 RDmitchell

Here is the error message

Nov 09 '17 02:11 RDmitchell

I can test locally first.

I wonder if this is a @mmclark question since it says upload failed. If the upload failed then it is most likely a configuration issue with S3/local file storage.

Nov 10 '17 15:11 nllong

#1565 #852

May 31 '18 15:05 nllong

@nllong -- I can try importing this file again. Is it ok to test this on dev1?

May 31 '18 17:05 RDmitchell

@nllong -- I just added it to the Project Tracker under Test and assigned it to myself.

May 31 '18 17:05 RDmitchell

instance: dev1 SHA: 90e4a7d Org: LBNL 33 File: sf13m_bldg_footprint_attribZ_pgz_data.xlsx This file has the "geom" data.

This time it got a bit further on the import, but failed at 25%.

I will try again with the same number of records but with a reduced set of fields.

May 31 '18 20:05 RDmitchell

instance: dev1 SHA: 90e4a7d Org: LBNL 33 File: sf13m_bldg_footprint_attribZ_pgz_data_no_polygons-MBLRParcelID.csv

Doesn’t have “geom” data (last field in the previous file)
Half the size of the file with the “geom” data

See this doc (in the issue folder) which shows the details of this process (in the section dated 5/31/2018) https://drive.google.com/open?id=1uAnRJ_cyxP5arwhr1psfFTUWqFbWDvorl1w40aHH6yc

This file got through import, mapping, but finally (after many hours) got this error on matching

Jun 01 '18 17:06 RDmitchell

instance: dev1 SHA: 90e4a7d Org: LBNL 33 File: sf13m_bldg_footprint_attribZ_pgz_data_no_polygons-MBLRParcelID.csv

hmmm, looking at the org via the superuser admin screen, it looks like the program may have actually imported all the records into the property table.

Clicking on the inventory list to see if the program will display the records, get a 502 error (bad gateway)

Jun 01 '18 18:06 RDmitchell

Looks like this is a cloudflare timeout. We should make that a separate issue and decide what we need to do in that issue.

The data will still load in the background even if cloudflare times out.

Aug 31 '18 15:08 nllong

is cloudflare something you have working with dev1, but not on the standard installation of SEED, such as the set up we have on our staging server?

Aug 31 '18 20:08 RDmitchell

your staging server does not use cloudflare, so it would behave differently.

On Fri, Aug 31, 2018 at 2:31 PM RDmitchell [email protected] wrote:

is cloudflare something you have working with dev1, but not on the standard installation of SEED, such as the set up we have on our staging server?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SEED-platform/seed/issues/1498#issuecomment-417781298, or mute the thread https://github.com/notifications/unsubscribe-auth/AB0amp5_NxIfpseEL5gdwyCt4jLYweLLks5uWZ0hgaJpZM4QXVVl .

Aug 31 '18 20:08 nllong

@nllong / @axelstudios -- do you think I should have @mmclark put the develop branch on our staging server to test it out there, since we don't have cloudflare. If you think this is a good idea, let us know when (what state of the develop branch) would be good to do this. Seems like it would be nice to do it sooner rather than later relative to the 2.4.0 release date.

Sep 05 '18 16:09 RDmitchell

I think it will be good to do this, let's wait until wed COB. Alex just fixed a bug that we need to merge down into develop.

Nick

On Wed, Sep 5, 2018 at 10:59 AM RDmitchell [email protected] wrote:

@nllong https://github.com/nllong / @axelstudios https://github.com/axelstudios -- do you think I should have @mmclark https://github.com/mmclark put the develop branch on our staging server to test it out there, since we don't have cloudflare. If you think this is a good idea, let us know when (what state of the develop branch) would be good to do this. Seems like it would be nice to do it sooner rather than later relative to the 2.4.0 release date.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SEED-platform/seed/issues/1498#issuecomment-418804574, or mute the thread https://github.com/notifications/unsubscribe-auth/AB0amupDImKbvK98KeJwxZPZtE_a2hZKks5uYALagaJpZM4QXVVl .

Sep 10 '18 20:09 nllong

Instance: seeddemostaging (LBNL) SHA: 49e40e0

See this doc for latest info https://docs.google.com/document/d/1uAnRJ_cyxP5arwhr1psfFTUWqFbWDvorl1w40aHH6yc/edit?usp=sharing

Got these errors

Oct 04 '18 20:10 RDmitchell

@nllong -- I don't think this is critical for the Sept release. I suggest we move it to the Dec release.

Oct 04 '18 20:10 RDmitchell

Ok. Thanks me for testing and letting me know.

On Thu, Oct 4, 2018 at 14:42 RDmitchell [email protected] wrote:

@nllong https://github.com/nllong -- I don't think this is critical for the Sept release. I suggest we move it to the Dec release.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SEED-platform/seed/issues/1498#issuecomment-427162304, or mute the thread https://github.com/notifications/unsubscribe-auth/AB0ampfNpx3puI9qW2vVxR9znw2e7yGoks5uhnLNgaJpZM4QXVVl .

Oct 04 '18 21:10 nllong

FYI, I was able to successfully import this file locally. If the only remaining issue is the Cloudflare timeout then Nick and I have a possible workaround

Oct 11 '18 20:10 axelstudios

I will test on our staging server, now that we have the release on it

Oct 11 '18 21:10 RDmitchell

instance: seeddemostaging (LBNL) SHA: 74239e5

I still get this error when I try to import the file -- this time I was importing the original file from 11/8/2017, which has all the fields. But I got this same error previously when trying to import the file with "geom_only" dated 5/31/2018

Oct 12 '18 18:10 RDmitchell

Instance: dev1 (NREL) SHA: edd89fa9 Org: LBNL 21

See this doc at the top for the latest details https://docs.google.com/document/d/1uAnRJ_cyxP5arwhr1psfFTUWqFbWDvorl1w40aHH6yc/edit?usp=sharing

Just tested this and still get the same error

Dec 05 '18 22:12 RDmitchell

This actually succeeded for me locally, but I can see why it failed. I had 8GB of memory available to my VM, and the memory spiked close to 95% just before celery began processing the individual rows. This looks to be very similar to the delete-column memory spike, and may be as easy as improving the chunked processing:

Mar 19 '21 16:03 axelstudios

Merged in #2079 to be included in this fix.

Jul 07 '22 21:07 nllong

Keeping this ticket for new FY23 board. We need to stress test new release with at least 1000 properties

Oct 07 '22 21:10 isalanglois

seed seed copied to clipboard

Error importing large file, 177,000 records

seed
seed copied to clipboard