EPIJudge icon indicating copy to clipboard operation
EPIJudge copied to clipboard

Incorrect time complexity calculation in 24.8 in Python book

Open deveshks opened this issue 4 years ago • 11 comments

The time complexity for the problem Justify Text is stated to be O(n), where n is the sum of lengths of the string in A.

But if we take a look at Line 14: https://github.com/adnanaziz/EPIJudge/blob/f2c94dc79c9b7b162c48dff6b83a7c33f654d9aa/epi_judge_python_solutions/left_right_justify_text.py#L14

We are performing string concatenation by concatenating a space to the string curr_line[i % max(len(curr_line) - 1, 1)] , which will cause a new string to be build for each character, which will cause our time complexity to go up to roughly O(maxWidth*n), since we can create maxWidth no of new strings per character by our concatenations.

Please notice that this might not be the correct time complexity computation but it will surely not be O(n)

deveshks avatar Jan 12 '20 17:01 deveshks

This is a very good find.

The fix I will add in the next release is using a counter to count the numbers of spaces for each word shall append. Then, we could append those spaces only one time for each word.

            num_spaces = [0] * len(curr_line)
            # Distribute equally between words in curr_line.
            for i in range(L - curr_line_length):
                num_spaces[i % max(len(curr_line) - 1, 1)] += 1
            result.append(''.join(
                ''.join((word, ' ' * num_space))
                for word, num_space in zip(curr_line, num_spaces)))

tsunghsienlee avatar Jan 13 '20 03:01 tsunghsienlee

@tsunghsienlee Thanks for the response, We can also add each word as a list of characters into curr_line, and modifying the join condition as follows.

def justify_text(words: List[str], L: int) -> List[str]:

    curr_line_length, result = 0, []
    curr_line: List[str] = []
    for word in words:
        if curr_line_length + len(word) + len(curr_line) > L:
            # Distribute equally between words in curr_line.
            for i in range(L - curr_line_length):
                curr_line[i % max(len(curr_line) - 1, 1)] += [' ']
            result.append(''.join(''.join(word) for word in curr_line))
            curr_line, curr_line_length = [], 0
        curr_line.append(list(word))
        curr_line_length += len(word)
    # Use ljust(L) to pad the last line with the appropriate number of blanks.
    return result + [' '.join(''.join(word) for word in curr_line).ljust(L)]

deveshks avatar Jan 13 '20 05:01 deveshks

strictly speaking, the time complexity is not determinable, since it depends on whether the python implementation keeps some buffer space for this purpose.

I had the impression most/all implementations do keep this space, but from this post, it's clear that some do not: https://stackoverflow.com/questions/44487537/why-does-naive-string-concatenation-become-quadratic-above-a-certain-length, so updated to code is needed

@thl, I confirmed java is good (uses StringBuilder), can you check C++? we can also look at programs using StringBuilder in Java, see if Py version has the flaw @devesh caught.

thanks again @devesh, it's a good catch,

cheers, adnan

On Sun, Jan 12, 2020 at 9:53 PM Devesh Kumar Singh [email protected] wrote:

@tsunghsienlee https://github.com/tsunghsienlee Thanks for the response, We can also add each word as a list of characters into curr_line, and modifying the join condition as follows.

def justify_text(words: List[str], L: int) -> List[str]:

curr_line_length, result = 0, []
curr_line: List[str] = []
for word in words:
    if curr_line_length + len(word) + len(curr_line) > L:
        # Distribute equally between words in curr_line.
        for i in range(L - curr_line_length):
            curr_line[i % max(len(curr_line) - 1, 1)] += [' ']
        result.append(''.join(''.join(word) for word in curr_line))
        curr_line, curr_line_length = [], 0
    curr_line.append(list(word))
    curr_line_length += len(word)
# Use ljust(L) to pad the last line with the appropriate number of blanks.
return result + [' '.join(''.join(word) for word in curr_line).ljust(L)]

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/adnanaziz/EPIJudge/issues/151?email_source=notifications&email_token=AAXDNFKKPAIFHGAZBGWJXMLQ5P6ULA5CNFSM4KFYY7DKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIXTCJI#issuecomment-573518117, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXDNFKHAOMCUJ5BBBCFY53Q5P6ULANCNFSM4KFYY7DA .

adnanaziz avatar Jan 13 '20 19:01 adnanaziz

Hi @adnanaziz I assume you mean to say that when we are using string concatenations, the time complexity is not determinable, which is a fair assumption.

Can I submit a PR with the solution I proposed in https://github.com/adnanaziz/EPIJudge/issues/151#issuecomment-573518117 to the dev branch?

deveshks avatar Jan 13 '20 19:01 deveshks

yes, that's what I was saying (though now that I think about it, immutability of strings may force complexity to be quadratic.

PR to dev branch sounds good (we've had some back and forth on PR methodology, @Tsung-Hsien Lee [email protected] will know best)

cheers, adnan

On Mon, Jan 13, 2020 at 11:27 AM Devesh Kumar Singh < [email protected]> wrote:

Hi @adnanaziz https://github.com/adnanaziz I assume you mean to say that when we are using string concatenations, the time complexity is not determinable, which is a fair assumption.

Can I submit a PR with the solution I proposed in #151 (comment) https://github.com/adnanaziz/EPIJudge/issues/151#issuecomment-573518117 to the dev branch?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/adnanaziz/EPIJudge/issues/151?email_source=notifications&email_token=AAXDNFIHU65GWLCKBBFT6MTQ5S6C7A5CNFSM4KFYY7DKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIZ7C4A#issuecomment-573829488, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXDNFLIDUK7QDDPAEJ252DQ5S6C7ANCNFSM4KFYY7DA .

adnanaziz avatar Jan 13 '20 20:01 adnanaziz

Thanks @adnanaziz for the response.

Hi @tsunghsienlee , please let me know about the updated methodology of submitting a PR to the dev branch, and I will follow and submit the PR accordingly

deveshks avatar Jan 14 '20 03:01 deveshks

Feel free to submit that as I was thinking which solution is better in terms of readability. I think yours is better but I need to think it more carefully as yours is depends on List[List[str]] now.

tsunghsienlee avatar Jan 14 '20 06:01 tsunghsienlee

Hi @tsunghsienlee

Could you please let me know your concerns with List[List[str]]?

curr_line will be of type List[List[str]] because of curr_line.append(list(word)) but I thought making a word a list of characters List[str] instead of a str which was before follows more intuitively to avoid the extra complexity of string concatenation. I also think that this would save some w.r.t to your original spaces having num_spaces.

But please let me know if there are any bette solutions using which I can modify my code.

deveshks avatar Jan 14 '20 12:01 deveshks

Two dimensional structure is always more complicate than one dimensional ones. Especially the use case of this one is that for each entry List[str], it is a list of one string (word), and a set of trailing white space characters (they are strings for sure as well). The structure is not intuitive for people in the first glance.

However, this is necessary as we are trying to escape from the time complexity traps of it due to the immutability of string. If you compare this part with C++ and Java codes, you will realize that they are much easier to understand than Python. However, I think my solution of counting number of white spaces is not clearly better either due to the extra step of concatenation (the logic of counting how much extra space each string needs is clear to me). I could imagine a normal Python reader who does not know the subtlety of Python string will confuse on this, and potentially blames the "unnecessary" works did here.

tsunghsienlee avatar Jan 14 '20 17:01 tsunghsienlee

Hi @tsunghsienlee

Thank you for the response. Let me submit a PR and I will add comments to the code signifying this subtletly, and you can review the PR for clarity and go from there

deveshks avatar Jan 14 '20 17:01 deveshks

Hi @tsunghsienlee, @adnanaziz

I have submitted a PR to outline my fix. Please review the same at your convenience and I will address any concerns with the same on the PR

deveshks avatar Jan 14 '20 18:01 deveshks