aicommits icon indicating copy to clipboard operation
aicommits copied to clipboard

Quick improvement suggestion: Reduce diff size with --ignore-all-space

Open gornostal opened this issue 2 years ago • 5 comments

First, thanks a lot for building this tool!

The current limitation of 200 lines could be improved by adding --ignore-all-space flag, which will produce more concise diff, for example when you wrap some lines in extra if or <div> or anything that just shifts the indentation one way or another.

I guess the code changes will be minimal. From this:

export const getStagedDiff = async () => {
	const diffCached = ['diff', '--cached'];
	const { stdout: files } = await execa(
		'git',

To this:

export const getStagedDiff = async () => {
	const diffCached = ['diff', '--cached', '--ignore-all-space'];
	const { stdout: files } = await execa(
		'git',

I doubt it makes sense to parameterize this change. Am I wrong? Happy to create a PR if needed.

gornostal avatar Feb 19 '23 08:02 gornostal

I thought about this, but some commits are just white-space changes (e.g. stylistic/linting).

If the diff doesn't show, GPT wouldn't be able to produce an accurate description of the change.

privatenumber avatar Feb 19 '23 14:02 privatenumber

I wonder if we can be smart about this and detect how much of the diff is white-space changes.

If the diff is mainly white-space change, we can pass in the diff with white-space.

If not (or if the diff is close to the OpenAI size limit), we can pass in the white-space ignored diff.

privatenumber avatar Feb 22 '23 09:02 privatenumber

smaller diff text:

const diffCached = ['diff', '--cached','--ignore-all-space','--diff-algorithm=minimal'];

ykankaya avatar Feb 24 '23 08:02 ykankaya

It could be a conf option, but for many cases detecting these changes are important, particularly for languages where indentation matters and as above - ie formatting & linting changes. How about just counting the tokens with tokeniser and set it to the models max number instead of line count? and allow people to set a limit themselves.

salomartin avatar Mar 04 '23 11:03 salomartin

Or actually even better, now that the new chat API is available and they take turns, the entire diff could be consumed file by file actually if detect it's bigger than some threshold and have a turn by turn chat, managing then to generate a comment about each and then asking to summarise each comment in the end. This would allow to work with quite large diffs as well.

https://github.com/openai/openai-python/blob/main/chatml.md https://platform.openai.com/docs/guides/chat/chat-vs-completions

salomartin avatar Mar 04 '23 11:03 salomartin

I don't think we can ignore white space for the reason provided above, and we've added --diff-algorithm=minimal so I think this is closable.

@salomartin That idea sounds interesting but I worry that it may (unexpectedly) incur large expenses for generating a commit.

privatenumber avatar May 03 '23 14:05 privatenumber