fluo icon indicating copy to clipboard operation
fluo copied to clipboard

Need to prevent GC for lifetime of M/R job

Open keith-turner opened this issue 10 years ago • 6 comments

Work done on #8 made the Fluo GC iter collect based on what Transactions are currently active. When using this Fluo input format all mappers use the same timestamp for reading data, and therefore all read from the same snapshot. Currently nothing is done to ensure data is not GCed for this snapshot.

keith-turner avatar Sep 22 '14 20:09 keith-turner

#33 is a possible solution to this

keith-turner avatar Sep 22 '14 20:09 keith-turner

AFAIK M/R has no mechanism for the input format to clean up after the entire job is completed. The solution for this issue should avoid an issue like ACCUMULO-829, where killing the process that started the M/R job borks the entire job.

Thats why I think #33 may be a good solution. A user could do something like the following.

  1. create named snapshot
  2. run job against named snapshot
  3. delete named snapshot

It should be easy for a user to do the above steps in a single process, but we should not require that it be done in a single process. I think #33 will be fairly easy to implement now that #8 is done.

keith-turner avatar Sep 22 '14 21:09 keith-turner

It sounds like the easiest to use approach might be something in the middle.

  1. Automatically created a snapshot (named, or otherwise) in InputFormat.getSplits and make w/e necessary advertisement to ensure that snapshot isn't deleted. You don't try to solve any automatic cleanup.
  2. Provide some sort of API in which users can call to instrument their Tool to create a snapshot, pass that snapshot into the Configuration for the InputFormat to use (instead of making a new snapshot) and then let the Tool clean up after the job finishes (assuming that the process that's running the Tool is still alive).

Long term, you can then try to think about something more fancy that can ensure snapshots aren't leaked.

joshelser avatar Sep 23 '14 02:09 joshelser

@joshelser I was thinking of only providing option 2 that you described. The user would be required to pass a named snapshot to FluoInputFormat. For Option 1 it seems like it would leave a snapshot around forever that would prevent GC?

keith-turner avatar Sep 23 '14 15:09 keith-turner

Sorry, I thought you meant option 1 as the default where they could come back "later" and clean it up -- e.g. some CLI tool. I guess if the Tool dies, it would be nice to provide some way that doesn't force the user to write some 5 line java class to clean up the snapshot and let GC happen again.

joshelser avatar Sep 23 '14 16:09 joshelser

I guess if the Tool dies, it would be nice to provide some way that doesn't force the user to write some 5 line java class to clean up the snapshot and let GC happen again

Right, that would be nice. If we don't do something, then the burden will be placed on every user to find a solution. Maybe named snapshots could optionally be created w/ a TTL? The users tool could do the following.

   //not sure about the API or where it would live, in API.. just picked fluoClient 
   //maybe should be admin?
   NamedSnapshotOptions nso = new NamedSnapshotOptions();
   nso.generateUniqueName(); //or could give it a name
   nso.setTTL(1, TimeUnit.DAYS);
   String namedSnap = fluoClient.createNamedSnap(nso);

   // configure FluoInputFormat to read from namedSnap

  // submit job & wait

  //if process that started M/R job dies, then something will delete named snap after 1 day
  fluoClient.deleteNamedSnapshot(namedSnap);

If named snapshots can expire, then StaleScanException becomes something that can and happen and that exception should be moved to public API.

keith-turner avatar Sep 23 '14 16:09 keith-turner