Fixed tokenizer and audio processing logic

Open 1amageek opened this issue 1 year ago • 1 comments

This PR addresses the following changes:

Updated the tokenizer logic in WhisperTokenizerWrapper.
Fixed the startIndex validation in AudioChunker.swift and AudioProcessor.swift.
Corrected the embedSize and sequenceLength calculations in AudioEncoder.swift.
Added missing implementations for applyChatTemplate and encode methods.
Updated dependencies in Package.swift.

Oct 03 '24 03:10 1amageek

Hi @1amageek, I'm happy to see your enthusiasm to contribute to WhisperKit! I'd like to make a few recommendations to help guide your work in the future, since this PR has grown quite large.

Your initial changes identified a bug in the startIndex checking code, which was great to see! I've adjusted it slightly and added you as a co-author in this commit 47035a8 (#216)
I'd highly recommend running the test locally in Xcode to make sure they pass. I've been periodically running the CI in this PR and they have been failing, which is a required check for merging.
I'd also recommend breaking up this PR into smaller more focused PRs that fix a specific issue or add a single new feature.
Try to make a best effort to conform to the existing code style. This includes keeping any existing documentation in place, using english language for new documentation, and including the common file header when adding new files.

With this in mind, I would recommend closing this PR and separating out the changes that add new features or fix existing bugs into separate PRs with details on why they are necessary and solve an open issue listed in https://github.com/argmaxinc/WhisperKit/issues. Thanks again for your contributions so far, open to any discussion on these topics.

Oct 06 '24 21:10 ZachNagengast