seed icon indicating copy to clipboard operation
seed copied to clipboard

Error importing large file, 177,000 records

Open RDmitchell opened this issue 7 years ago • 23 comments

instance: staging V 2.2.0 release

This is a very large file (177,000 records) but not many fields. PNNL is trying to import it to SEED as part of their UBID work. It would be nice if we could figure out whether SEED can import it or not. Right now on staging the program gets to about 10% and then gives an error.

See this folder for the file. https://drive.google.com/open?id=1e5-c5EpB4YkUlX442DD64IJpOCW-1No0

@nllong -- could I try importing this file onto dev1? Or do you want to?

RDmitchell avatar Nov 09 '17 02:11 RDmitchell

Here is the error message image

RDmitchell avatar Nov 09 '17 02:11 RDmitchell

I can test locally first.

I wonder if this is a @mmclark question since it says upload failed. If the upload failed then it is most likely a configuration issue with S3/local file storage.

nllong avatar Nov 10 '17 15:11 nllong

#1565 #852

nllong avatar May 31 '18 15:05 nllong

@nllong -- I can try importing this file again. Is it ok to test this on dev1?

RDmitchell avatar May 31 '18 17:05 RDmitchell

@nllong -- I just added it to the Project Tracker under Test and assigned it to myself.

RDmitchell avatar May 31 '18 17:05 RDmitchell

instance: dev1 SHA: 90e4a7d Org: LBNL 33 File: sf13m_bldg_footprint_attribZ_pgz_data.xlsx This file has the "geom" data.

This time it got a bit further on the import, but failed at 25%. image

I will try again with the same number of records but with a reduced set of fields.

RDmitchell avatar May 31 '18 20:05 RDmitchell

instance: dev1 SHA: 90e4a7d Org: LBNL 33 File: sf13m_bldg_footprint_attribZ_pgz_data_no_polygons-MBLRParcelID.csv

  • Doesn’t have “geom” data (last field in the previous file)
  • Half the size of the file with the “geom” data

See this doc (in the issue folder) which shows the details of this process (in the section dated 5/31/2018) https://drive.google.com/open?id=1uAnRJ_cyxP5arwhr1psfFTUWqFbWDvorl1w40aHH6yc

This file got through import, mapping, but finally (after many hours) got this error on matching image

RDmitchell avatar Jun 01 '18 17:06 RDmitchell

instance: dev1 SHA: 90e4a7d Org: LBNL 33 File: sf13m_bldg_footprint_attribZ_pgz_data_no_polygons-MBLRParcelID.csv

hmmm, looking at the org via the superuser admin screen, it looks like the program may have actually imported all the records into the property table.

image

Clicking on the inventory list to see if the program will display the records, get a 502 error (bad gateway) image

RDmitchell avatar Jun 01 '18 18:06 RDmitchell

Looks like this is a cloudflare timeout. We should make that a separate issue and decide what we need to do in that issue.

The data will still load in the background even if cloudflare times out.

nllong avatar Aug 31 '18 15:08 nllong

is cloudflare something you have working with dev1, but not on the standard installation of SEED, such as the set up we have on our staging server?

RDmitchell avatar Aug 31 '18 20:08 RDmitchell

your staging server does not use cloudflare, so it would behave differently.

On Fri, Aug 31, 2018 at 2:31 PM RDmitchell [email protected] wrote:

is cloudflare something you have working with dev1, but not on the standard installation of SEED, such as the set up we have on our staging server?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SEED-platform/seed/issues/1498#issuecomment-417781298, or mute the thread https://github.com/notifications/unsubscribe-auth/AB0amp5_NxIfpseEL5gdwyCt4jLYweLLks5uWZ0hgaJpZM4QXVVl .

nllong avatar Aug 31 '18 20:08 nllong

@nllong / @axelstudios -- do you think I should have @mmclark put the develop branch on our staging server to test it out there, since we don't have cloudflare. If you think this is a good idea, let us know when (what state of the develop branch) would be good to do this. Seems like it would be nice to do it sooner rather than later relative to the 2.4.0 release date.

RDmitchell avatar Sep 05 '18 16:09 RDmitchell

I think it will be good to do this, let's wait until wed COB. Alex just fixed a bug that we need to merge down into develop.

Nick

On Wed, Sep 5, 2018 at 10:59 AM RDmitchell [email protected] wrote:

@nllong https://github.com/nllong / @axelstudios https://github.com/axelstudios -- do you think I should have @mmclark https://github.com/mmclark put the develop branch on our staging server to test it out there, since we don't have cloudflare. If you think this is a good idea, let us know when (what state of the develop branch) would be good to do this. Seems like it would be nice to do it sooner rather than later relative to the 2.4.0 release date.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SEED-platform/seed/issues/1498#issuecomment-418804574, or mute the thread https://github.com/notifications/unsubscribe-auth/AB0amupDImKbvK98KeJwxZPZtE_a2hZKks5uYALagaJpZM4QXVVl .

nllong avatar Sep 10 '18 20:09 nllong

Instance: seeddemostaging (LBNL) SHA: 49e40e0

See this doc for latest info https://docs.google.com/document/d/1uAnRJ_cyxP5arwhr1psfFTUWqFbWDvorl1w40aHH6yc/edit?usp=sharing

Got these errors image

RDmitchell avatar Oct 04 '18 20:10 RDmitchell

@nllong -- I don't think this is critical for the Sept release. I suggest we move it to the Dec release.

RDmitchell avatar Oct 04 '18 20:10 RDmitchell

Ok. Thanks me for testing and letting me know.

On Thu, Oct 4, 2018 at 14:42 RDmitchell [email protected] wrote:

@nllong https://github.com/nllong -- I don't think this is critical for the Sept release. I suggest we move it to the Dec release.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SEED-platform/seed/issues/1498#issuecomment-427162304, or mute the thread https://github.com/notifications/unsubscribe-auth/AB0ampfNpx3puI9qW2vVxR9znw2e7yGoks5uhnLNgaJpZM4QXVVl .

nllong avatar Oct 04 '18 21:10 nllong

FYI, I was able to successfully import this file locally. If the only remaining issue is the Cloudflare timeout then Nick and I have a possible workaround

axelstudios avatar Oct 11 '18 20:10 axelstudios

I will test on our staging server, now that we have the release on it

RDmitchell avatar Oct 11 '18 21:10 RDmitchell

instance: seeddemostaging (LBNL) SHA: 74239e5

I still get this error when I try to import the file -- this time I was importing the original file from 11/8/2017, which has all the fields. But I got this same error previously when trying to import the file with "geom_only" dated 5/31/2018

image

RDmitchell avatar Oct 12 '18 18:10 RDmitchell

Instance: dev1 (NREL) SHA: edd89fa9 Org: LBNL 21

See this doc at the top for the latest details https://docs.google.com/document/d/1uAnRJ_cyxP5arwhr1psfFTUWqFbWDvorl1w40aHH6yc/edit?usp=sharing

Just tested this and still get the same error image

RDmitchell avatar Dec 05 '18 22:12 RDmitchell

This actually succeeded for me locally, but I can see why it failed. I had 8GB of memory available to my VM, and the memory spiked close to 95% just before celery began processing the individual rows. This looks to be very similar to the delete-column memory spike, and may be as easy as improving the chunked processing:

image

axelstudios avatar Mar 19 '21 16:03 axelstudios

Merged in #2079 to be included in this fix.

nllong avatar Jul 07 '22 21:07 nllong

Keeping this ticket for new FY23 board. We need to stress test new release with at least 1000 properties

isalanglois avatar Oct 07 '22 21:10 isalanglois