pydatastructs icon indicating copy to clipboard operation
pydatastructs copied to clipboard

Adding measures of sortedness/disorder in arrays

Open HarsheetKakar opened this issue 3 years ago • 5 comments

We should provide a function that should take an array as input and suggest one of the pydatastructs APIs as output. The time complexity should be O(n) because anything more won't be worth it. The constant in O(n) should also be very small (we can benchmark for that).

References

.. [1] https://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=5F6D0F8B0B37F0A66AAD3100BD0459E7?doi=10.1.1.45.8017&rep=rep1&type=pdf

HarsheetKakar avatar Dec 04 '21 18:12 HarsheetKakar

@czgdp1807 feel free to edit the discription. As far as the O(n) complexity we are talking about, is this for checking the "sortedness" or sorting the array. As for the latter we will need to put constraints on the nature of elements passed in the array (e.g. I dont think we will be able to promise O(n) in case of floating point numbers IMHO).

And as far as checking the sortedness of the array, I was thinking of going for a probabilistic algorithm (some of which can go as low as O(log(n)/e) where e is the "distance" from sorted array).

HarsheetKakar avatar Dec 04 '21 18:12 HarsheetKakar

As far as the O(n) complexity we are talking about, is this for checking the "sortedness" or sorting the array

I am referring to the sortedness.On the basis of the sortedness, we should suggest the sorting algorithm from pydatastructs.

I was thinking of going for a probabilistic algorithm (some of which can go as low as O(log(n)/e) where e is the "distance" from sorted array).

Is the any research work that already explores this?

N.B. https://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=5F6D0F8B0B37F0A66AAD3100BD0459E7?doi=10.1.1.45.8017&rep=rep1&type=pdf

czgdp1807 avatar Dec 04 '21 18:12 czgdp1807

Is the any research work that already explores this?

Yes Hamming Distance

HarsheetKakar avatar Dec 05 '21 05:12 HarsheetKakar

In the paper in my above comment, section I.2 lists 10 measures of disorder for arrays. I feel that all these are worth adding into pydatastructs.

czgdp1807 avatar Dec 05 '21 08:12 czgdp1807

As a GSSOC 23 member , I request you to assign issue to me.

mridul45 avatar May 20 '23 06:05 mridul45