tensorflow icon indicating copy to clipboard operation
tensorflow copied to clipboard

Support string sorting

Open tgsmith61591 opened this issue 5 years ago • 5 comments

Describe the feature and the current behavior/state.

Currently string sorting in the graph is unsupported:

Does not work:

tf.sort(['d', 'a', 'b'])
InvalidArgumentError: Value for attr 'T' of string is not in the list of allowed values: bfloat16, half, float, double, int8, int16, int32, int64, complex64, complex128
	; NodeDef: {{node Neg}}; Op<name=Neg; signature=x:T -> y:T; attr=T:type,allowed=[DT_BFLOAT16, DT_HALF, DT_FLOAT, DT_DOUBLE, DT_INT8, DT_INT16, DT_INT32, DT_INT64, DT_COMPLEX64, DT_COMPLEX128]> [Op:Neg]

Works (but not in the graph):

>>> x = tf.constant(['d', 'a', 'b'])
>>> tf.convert_to_tensor(sorted(x.numpy()), tf.string)
<tf.Tensor: shape=(3,), dtype=string, numpy=array([b'a', b'b', b'd'], dtype=object)>

Can tf.sort be amended to allow string sorting?

Will this change the current api? How?

No

Who will benefit with this feature?

Anyone who might need to string-sort in the graph

Any Other info.

tgsmith61591 avatar Nov 09 '20 14:11 tgsmith61591

I checked the code of operation, the problem is that tf.sort actually call the TOPK OP that does not support tf.string input type but only real number type.

REGISTER_OP("TopKV2")
    .Input("input: T")
    .Input("k: int32")
    .Output("values: T")
    .Output("indices: int32")
    .Attr("sorted: bool = true")
    .Attr("T: realnumbertype")  // real num type only
    .SetShapeFn(TopKShapeFn);

If change of string type is acceptable with interface design and the scope of change ? if it is acceptable, I'd like to commit the c++ code of impl.

ljwh avatar Nov 13 '20 02:11 ljwh

@jvishnuvardhan

ljwh avatar Nov 27 '20 03:11 ljwh

I checked the code of operation, the problem is that tf.sort actually call the TOPK OP that does not support tf.string input type but only real number type.

REGISTER_OP("TopKV2")
    .Input("input: T")
    .Input("k: int32")
    .Output("values: T")
    .Output("indices: int32")
    .Attr("sorted: bool = true")
    .Attr("T: realnumbertype")  // real num type only
    .SetShapeFn(TopKShapeFn);

If change of string type is acceptable with interface design and the scope of change ? if it is acceptable, I'd like to commit the c++ code of impl.

May be have it included in tf.strings like

tf.strings.sort(['d', 'b', 'a'])
>> ['a', 'b', 'c']

preetham-salehundam avatar Aug 25 '22 16:08 preetham-salehundam

Hi,

Thank you for opening this issue. Since this issue has been open for a long time, the code/debug information for this issue may not be relevant with the current state of the code base.

The Tensorflow team is constantly improving the framework by fixing bugs and adding new features. We suggest you try the latest TensorFlow version with the latest compatible hardware configuration which could potentially resolve the issue. If you are still facing the issue, please create a new GitHub issue with your latest findings, with all the debugging information which could help us investigate.

Please follow the release notes to stay up to date with the latest developments which are happening in the Tensorflow space.

Venkat6871 avatar Aug 20 '24 05:08 Venkat6871

This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you.

github-actions[bot] avatar Aug 28 '24 01:08 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale. Please reopen if you'd like to work on this further.

github-actions[bot] avatar Sep 04 '24 01:09 github-actions[bot]