seed
seed copied to clipboard
Error importing large file, 177,000 records
instance: staging V 2.2.0 release
This is a very large file (177,000 records) but not many fields. PNNL is trying to import it to SEED as part of their UBID work. It would be nice if we could figure out whether SEED can import it or not. Right now on staging the program gets to about 10% and then gives an error.
See this folder for the file. https://drive.google.com/open?id=1e5-c5EpB4YkUlX442DD64IJpOCW-1No0
@nllong -- could I try importing this file onto dev1? Or do you want to?
Here is the error message
I can test locally first.
I wonder if this is a @mmclark question since it says upload failed. If the upload failed then it is most likely a configuration issue with S3/local file storage.
#1565 #852
@nllong -- I can try importing this file again. Is it ok to test this on dev1?
@nllong -- I just added it to the Project Tracker under Test and assigned it to myself.
instance: dev1 SHA: 90e4a7d Org: LBNL 33 File: sf13m_bldg_footprint_attribZ_pgz_data.xlsx This file has the "geom" data.
This time it got a bit further on the import, but failed at 25%.
I will try again with the same number of records but with a reduced set of fields.
instance: dev1 SHA: 90e4a7d Org: LBNL 33 File: sf13m_bldg_footprint_attribZ_pgz_data_no_polygons-MBLRParcelID.csv
- Doesn’t have “geom” data (last field in the previous file)
- Half the size of the file with the “geom” data
See this doc (in the issue folder) which shows the details of this process (in the section dated 5/31/2018) https://drive.google.com/open?id=1uAnRJ_cyxP5arwhr1psfFTUWqFbWDvorl1w40aHH6yc
This file got through import, mapping, but finally (after many hours) got this error on matching
instance: dev1 SHA: 90e4a7d Org: LBNL 33 File: sf13m_bldg_footprint_attribZ_pgz_data_no_polygons-MBLRParcelID.csv
hmmm, looking at the org via the superuser admin screen, it looks like the program may have actually imported all the records into the property table.
Clicking on the inventory list to see if the program will display the records, get a 502 error (bad gateway)
Looks like this is a cloudflare timeout. We should make that a separate issue and decide what we need to do in that issue.
The data will still load in the background even if cloudflare times out.
is cloudflare something you have working with dev1, but not on the standard installation of SEED, such as the set up we have on our staging server?
your staging server does not use cloudflare, so it would behave differently.
On Fri, Aug 31, 2018 at 2:31 PM RDmitchell [email protected] wrote:
is cloudflare something you have working with dev1, but not on the standard installation of SEED, such as the set up we have on our staging server?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SEED-platform/seed/issues/1498#issuecomment-417781298, or mute the thread https://github.com/notifications/unsubscribe-auth/AB0amp5_NxIfpseEL5gdwyCt4jLYweLLks5uWZ0hgaJpZM4QXVVl .
@nllong / @axelstudios -- do you think I should have @mmclark put the develop branch on our staging server to test it out there, since we don't have cloudflare. If you think this is a good idea, let us know when (what state of the develop branch) would be good to do this. Seems like it would be nice to do it sooner rather than later relative to the 2.4.0 release date.
I think it will be good to do this, let's wait until wed COB. Alex just fixed a bug that we need to merge down into develop.
Nick
On Wed, Sep 5, 2018 at 10:59 AM RDmitchell [email protected] wrote:
@nllong https://github.com/nllong / @axelstudios https://github.com/axelstudios -- do you think I should have @mmclark https://github.com/mmclark put the develop branch on our staging server to test it out there, since we don't have cloudflare. If you think this is a good idea, let us know when (what state of the develop branch) would be good to do this. Seems like it would be nice to do it sooner rather than later relative to the 2.4.0 release date.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SEED-platform/seed/issues/1498#issuecomment-418804574, or mute the thread https://github.com/notifications/unsubscribe-auth/AB0amupDImKbvK98KeJwxZPZtE_a2hZKks5uYALagaJpZM4QXVVl .
Instance: seeddemostaging (LBNL) SHA: 49e40e0
See this doc for latest info https://docs.google.com/document/d/1uAnRJ_cyxP5arwhr1psfFTUWqFbWDvorl1w40aHH6yc/edit?usp=sharing
Got these errors
@nllong -- I don't think this is critical for the Sept release. I suggest we move it to the Dec release.
Ok. Thanks me for testing and letting me know.
On Thu, Oct 4, 2018 at 14:42 RDmitchell [email protected] wrote:
@nllong https://github.com/nllong -- I don't think this is critical for the Sept release. I suggest we move it to the Dec release.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SEED-platform/seed/issues/1498#issuecomment-427162304, or mute the thread https://github.com/notifications/unsubscribe-auth/AB0ampfNpx3puI9qW2vVxR9znw2e7yGoks5uhnLNgaJpZM4QXVVl .
FYI, I was able to successfully import this file locally. If the only remaining issue is the Cloudflare timeout then Nick and I have a possible workaround
I will test on our staging server, now that we have the release on it
instance: seeddemostaging (LBNL) SHA: 74239e5
I still get this error when I try to import the file -- this time I was importing the original file from 11/8/2017, which has all the fields. But I got this same error previously when trying to import the file with "geom_only" dated 5/31/2018
Instance: dev1 (NREL) SHA: edd89fa9 Org: LBNL 21
See this doc at the top for the latest details https://docs.google.com/document/d/1uAnRJ_cyxP5arwhr1psfFTUWqFbWDvorl1w40aHH6yc/edit?usp=sharing
Just tested this and still get the same error
This actually succeeded for me locally, but I can see why it failed. I had 8GB of memory available to my VM, and the memory spiked close to 95% just before celery began processing the individual rows. This looks to be very similar to the delete-column memory spike, and may be as easy as improving the chunked processing:
Merged in #2079 to be included in this fix.
Keeping this ticket for new FY23 board. We need to stress test new release with at least 1000 properties