seed icon indicating copy to clipboard operation
seed copied to clipboard

Packaging data for export and enable import

Open RDmitchell opened this issue 10 years ago • 16 comments

Data packaging feature that allows the user to export all their data (including extra data, projects, user accounts and permissions) in a packaged file format. This will allow users to move data between different SEED Platform instances. It may not be a trivial exercise, as the database structures may not be the same between the two instances. (LBNL Ticket # 104)

RDmitchell avatar Oct 28 '14 22:10 RDmitchell

@RDmitchell and @nllong I want to confirm the scope of this.

Considering the age of this issue and the potentially large scope involved, is the goal to completely migrate an organization from one deployment to another where there is virtually no difference between the two?

Along those lines, although the title only says "export", at the very least, importing should be heavily considered, if not, included. Specifically, a successful export should contain enough information to be able to later import.

There's definitely a lot to consider when building this out, but in some format, this export will probably be a zip of files that include all the records, notes, labels, meters, etc. as well as the associations between them. I'm still thinking about what this would look like, so I'm open to suggestions.

adrian-lara avatar Jan 12 '20 23:01 adrian-lara

@adrian-lara / @nllong

This issue is here because in the future, it may be necessary to move organizations between deployments.

Export may not be the right word, but basically the functionality needs to be that an organization in one SEED instance can be moved to another deployment, by a web admin (this is not something that the user would do).

And it needs to be everything relevant to the organization, so that when the user logs in to the new instance, the "world" is the same as in the previous instance.

I suspect this is a big deal to implement, but it does seem like we need this functionality. At the very least, if any of the users on the LBNL production instance want to set up their own instance, they would want to get their data into the new instance (again it's a web admin / IT department process, not something the users would do themselves).

RDmitchell avatar Jan 13 '20 23:01 RDmitchell

Thanks for providing those details Robin!

@nllong Given the above, and assuming this might only happen a handful of times, my initial reaction is that it would be significantly less work to do this on request:

  • Dump the prod DB
  • Restore it within the new deployment
  • Delete all other Organizations

This might require us to update the current delete_organization ./seed/tasks.py method to delete other, more recent, model records (meters, meter_readings, etc.) if cascading deletions don't occur. But I'm imagining this would be significantly easier than having to export and reimport all of an organization's different types of records and the relationships between them.

Of course, I won't take any action until we discuss further.

adrian-lara avatar Jan 14 '20 01:01 adrian-lara

@adrian-lara -- this works for isolating an individual organization.

Then the question becomes how (web admin, not general user) would import that org into an existing instance?

RDmitchell avatar Jan 16 '20 22:01 RDmitchell

@RDmitchell After talking with Nick, the approach we're hoping to take here is to break this up into smaller more manageable efforts. We're still thinking about how exactly this will be broken up, but my plan is to create separate issues for each, then close this while referencing the new issues in a comment.

I'd imagine that this effort wouldn't require much user input as long as they get the data they want. Which brings me to the following caveat...
We're thinking it would really make this a more manageable effort if we could hold off on importing historical information. Specifically, we'll only be taking the first two columns for each inventory detail page.

Screen Shot 2020-02-10 at 9 18 21 PM

Let us know your thoughts/reactions to that.

adrian-lara avatar Feb 12 '20 06:02 adrian-lara

@adrian-lara / @nllong -- I'm not sure that would really be an acceptable solution.

If a user wanted to be hosted on another instance besides ours, they would lose the history of their imports, etc.

That doesn't sound great to me.

??

RDmitchell avatar Feb 13 '20 00:02 RDmitchell

I think that reaction makes sense. The history definitely adds a layer of complication, but I'll aim to incorporate it and likely make it it's own issue/ticket. No definitive plan of attack yet, but as I think more about how to accomplish the export/import goal for an org and the items belonging to it, I'll definitely update you and Nick on any notable plans, issues, or caveats.

adrian-lara avatar Feb 13 '20 02:02 adrian-lara

And again, this will probably be a web-admin-level task -- we don't have to make it something that a normal SEED user could accomplish.

RDmitchell avatar Feb 14 '20 19:02 RDmitchell

@nllong - To clarify my comments from our call, I couldn't think of a way to break this problem down into different models or even groups of models. The relationships among all the models makes it really difficult to think about the idea of exporting/importing sets of models and relationships at different times.

For example, if we first tried exporting and importing Properties, PropertyStates, and PropertyViews. Then, later we tried to export and import Notes, it would be really hard to reestablish relationships between Notes and PropertyViews assuming the PropertyViews have new IDs.

My thinking is that we'd have to somehow know ahead of an import or capture the new IDs during an import in order to then update ForeignKeys on related models or joins tables. To me, this means all the models and relationships would need to be imported within the same import. The main topic I'm researching are pg_dumps using inserts (vs copies).

Generally, more research is needed, but that's where my head is at.

adrian-lara avatar Feb 15 '20 19:02 adrian-lara

Thanks for the update. I know that @axelstudios has suggested the use of postgres schemas for organizations. Is this something that we should look into?

nllong avatar Feb 17 '20 15:02 nllong

Not knowing all the details, I think that ALL the data for an organization would need to be moved to a new instance.

RDmitchell avatar Feb 18 '20 22:02 RDmitchell

It's been some time since we last chatted about this, but at the very least, I wanted to officially punt on this until there's explicit, higher priority need. Let me know if the situation has changed, and we should not de-prioritize this. Otherwise, I just wanted to leave some documentation for future efforts.


Case 1

For the case of exporting and importing an org into a completely new deployment/instance, a simple dump, restore, and cleaning (of other orgs) can be reasonably executed on a case by case basis.

Case 2

Generally, we're open to any suggestions on how this second case can be accomplished.

The goal of being able to export an existing org and importing it into another deployment/instance that already has data doesn't have any clear paths forward right now. As mentioned previously, "breaking up this effort" by different models doesn't seem viable given the interconnected-ness of the SEED DB.

At the "Django-level", I've looked at django-import-export, and I'm not sure if it'd be capable easily handling those heavily interconnected relationships. I'm mentioning it here as it might be worth another set of eyes to skim and assess the viability of using that package.

At the "postgres-level", we'd have to somehow build an INSERT pg_dump that has modified IDs across all models - these would be incremented enough to avoid collisions with existing data. I'm not sure if that can be easily done. Separately, one potential pitfall involving any type of postgres-level dump and restore is that MeterReading's underlying TimescaleDB hypertables would likely be built differently from the source database. I haven't made it far enough to even know if this would be a problem, but I wanted to document that thought here.

adrian-lara avatar May 20 '20 23:05 adrian-lara

@RDmitchell I just created #2946 If we implement that in such a way that it's importable by SEED through the normal meter import flow, would it cover the use-case of this issue?

I'm thinking users can already select and export properties, and if they also export the meter data, then they should be able to move data from one SEED instance to another

macintoshpie avatar Oct 13 '21 21:10 macintoshpie

@RDmitchell thoughts on the comment above?

macintoshpie avatar Dec 13 '21 18:12 macintoshpie

This issue has been automatically marked as stale because it has not had recent activity within 60 days. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Oct 15 '22 02:10 github-actions[bot]

This is still important.

RDmitchell avatar Oct 18 '22 19:10 RDmitchell

This issue has been automatically marked as stale because it has not had recent activity within 60 days. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Dec 18 '22 01:12 github-actions[bot]

This issue has been automatically marked as stale because it has not had recent activity within 60 days. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Apr 01 '23 01:04 github-actions[bot]

This issue has been automatically marked as stale because it has not had recent activity within 60 days. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Jun 24 '23 02:06 github-actions[bot]

This issue has been automatically marked as stale because it has not had recent activity within 60 days. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Aug 24 '23 01:08 github-actions[bot]

This issue has been automatically marked as stale because it has not had recent activity within 60 days. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Oct 24 '23 01:10 github-actions[bot]

This issue has been closed automatically. If this still affects you please re-open this issue with a comment or contact us so we can look into resolving it.

github-actions[bot] avatar Oct 31 '23 01:10 github-actions[bot]

This issue has been automatically marked as stale because it has not had recent activity within 60 days. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Dec 31 '23 01:12 github-actions[bot]

This issue has been automatically marked as stale because it has not had recent activity within 60 days. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Mar 01 '24 01:03 github-actions[bot]

This issue has been closed automatically. If this still affects you please re-open this issue with a comment or contact us so we can look into resolving it.

github-actions[bot] avatar Mar 09 '24 01:03 github-actions[bot]

@kflemin -- I am reopening this issue, only because it might be something that we need to address if people want to move their data from SEED to another platform such as BEAM.

But if that has already been solved, then we can close this issue.

RDmitchell avatar Mar 09 '24 01:03 RDmitchell