briefcase icon indicating copy to clipboard operation
briefcase copied to clipboard

Export submissions from ODK dir is failing.

Open kkrawczyk123 opened this issue 6 years ago • 6 comments

Software versions

Briefcase v1.11.x, Java v1.x.x, operating system, Aggregate v1.x.x, Collect v1.x.x...

Problem description

There is an unexpected error displayed on the newest master when exporting submissions pulled from odk directory. In the log file there is Error while exporting forms org.opendatakit.briefcase.reused.BriefcaseException: No instance ID found.

On the 1.11.1 beta, there is no error displayed but submissions are not exported, 0 submissions exported is displayed in the status, in the csv file there is only the header visible.

On the briefcase 1.10 submissions are exported, no error occurs, but they don't have SubmissionDate.

Steps to reproduce the problem

  1. Pull submissions from odk folder
  2. Export pulled submissions

kkrawczyk123 avatar Sep 04 '18 11:09 kkrawczyk123

So, after analyzing this issue, we have identified a particular scenario that we should solve separately: submissions without an instance ID.

These submissions have always been excluded from the export output. We could think of alternatives:

  • Briefcase could assign each submission with a random ID, used only when exporting (wouldn't change the source submission.xml files in the storage directory)

ggalmazor avatar Sep 04 '18 11:09 ggalmazor

The problem with the random ID is that it doesn't make it easy for folks to link that submission with the output. Is it crazy to use some null ID?

yanokwa avatar Sep 04 '18 14:09 yanokwa

The hardest problem to solve is when forms have repeat groups. Null IDs in this scenario wouldn't work because it would be impossible to link rows from a repeat group CSV file to the main CSV file.

If Briefcase assigns random IDs, they can track a repeat group CSV file line to it's parent file in the main CSV file.

Also, I think random IDs don't harm the scenario where there are no repeat groups.

We could also use random IDs only if we know there are repeat groups, and use null IDs otherwise, but that feels like it's going to be more confusing for users if they don't realize when are null IDs expected and when not.

ggalmazor avatar Sep 04 '18 15:09 ggalmazor

After discussing this with @yanokwa, and @lognaturel, we decided to redefine the validity rules of a submission:

This means that for a submission with repeat groups to be valid, it needs to have an instance id. This makes sense because, otherwise, it wouldn't be possible to link the lines from the output repeat CSV file to their corresponding lines on the on the main CSV output files.

When a form doesn't define repeat groups, though, it's perfectly reasonable to not have instance ids, and the output CSV file will have blank values on that column.

(copied from #626)

ggalmazor avatar Sep 04 '18 17:09 ggalmazor

A side note -- when @ggalmazor and I were talking informally, he mentioned that there's also a check for an instanceID attribute on the root node and I wasn't sure what that was about. Now that I've thought about it, I believe it's one of the things along with submissionDate that Aggregate injects into submissions. Does that sound right, @ggalmazor? If so, I think it's an important check because it does catch the case where where a form without an instanceID is uploaded to Aggregate (I think this explains why we've never seen a problem with e.g. Birds). It doesn't help in the case where a submission is pulled directly from Collect, though.

lognaturel avatar Sep 04 '18 19:09 lognaturel

I think I would have to check that with Aggregate's source code in front of me... I think it's an important thing we're doing here by documenting all these edge-case scenarios.

ggalmazor avatar Sep 04 '18 19:09 ggalmazor