accumulo icon indicating copy to clipboard operation
accumulo copied to clipboard

Need -bc, -ec, -bq, and -eq options on shell scan command

Open ivakegg opened this issue 4 months ago • 9 comments

Given the nature of the data we handle, we commonly need to be able to scan via the accumulo shell a row starting at a CF value or starting at a CQ value to be able to efficiently find some data. I would like the following options added to the shell scan command:

-bc,--begin-column [:] -ec,--end-column [:] -bq,--begin-column-qualifier -eq,--end-column-qualifier

It is reasonable to prevent the use -bq and -eq if already using -bc or -ec and it is reasonable to prevent the use of any of these if -b or -e is being used. However those could be intermixed by creating multiple ranges when performing the scan but that is optional for our use case.

ivakegg avatar Sep 18 '25 16:09 ivakegg

For reference, the scan command currently supports selecting columns, column families, and column qualifiers. https://github.com/apache/accumulo/blob/054da60993afced1643bbcbd1104d16871dcf4a8/shell/src/main/java/org/apache/accumulo/shell/commands/ScanCommand.java#L338-L341

The list of columns is mutually exclusive with the column family and qualifier options.

At first glance, these proposed options seem like helper values that would build the contents of the current columns opt.

@ivakegg Is this something currently achievable with the columns opt and just prone to errors? Or is this entirely new functionality?

Also, would you also need exclusivity options similar to the row command for this column selector logic? https://github.com/apache/accumulo/blob/054da60993afced1643bbcbd1104d16871dcf4a8/shell/src/main/java/org/apache/accumulo/shell/commands/ScanCommand.java#L331-L336

ddanielr avatar Sep 18 '25 17:09 ddanielr

Currently the -c, -cf, and -cq specify the exact colum family and column qualifier to scan for. I need to scan for data that is between two column family values for the same row or between two column qualifier values for the same column family. I do not believe the current options allow me to do that but I coiuld be misunderstanding something.

ivakegg avatar Sep 18 '25 19:09 ivakegg

WRT the exclusivity options, the existing ones would suffice but update the comment to not be specific to the row but instead to the entire key (row + cf _ cq)

ivakegg avatar Sep 18 '25 19:09 ivakegg

okay so you're looking for the scanner to support accepting a range of column families and column qualifiers. The existing column options don't support that because they are treated as filters.

https://github.com/apache/accumulo/blob/2.1/core/src/main/java/org/apache/accumulo/core/client/ScannerBase.java#L128-L155

ddanielr avatar Sep 18 '25 19:09 ddanielr

I think what's being asked here is the ability to construct, from the shell options, a full startKey and endKey, rather than just a startRow and endRow.

Right now, the shell basically does: new Range((Text) startRow, (Text) endRow).

What is being asked is to be able to do something like: new Range(new Key(startRow, startColFam, startColQual), new Key(endRow, endColFam, endColQual))

ctubbsii avatar Sep 18 '25 21:09 ctubbsii

That is exactly correct @ctubbsii

ivakegg avatar Sep 19 '25 12:09 ivakegg

Wonder if begin key and end key options would be good for this like -bk row[:fam[:qual[:vis[:stamp]]]] and -ek row[:fam[:qual[:vis[:stamp]]]] where you can optionally specify more fields of the key. These options would be mutually exclusive with the row options.

keith-turner avatar Sep 23 '25 20:09 keith-turner

Specifying the full key means parsing out delimiters... which means adding escaping support to use delimiter characters in the user data. It gets complicated fast. I don't know that we'd want to do all that. It's probably easier to specify them as separate options.

ctubbsii avatar Sep 23 '25 22:09 ctubbsii

I'll look into adding this

kevinrr888 avatar Oct 10 '25 17:10 kevinrr888