Descent3 Improve render perf for MSVC Debug configuration

Pull Request Type

[ ] GitHub Workflow changes
[ ] Documentation or Wiki changes
[ ] Build and Dependency changes
[x] Runtime changes
- [x] Render changes
- [ ] Audio changes
- [ ] Input changes
- [ ] Network changes
- [ ] Other changes

Description

MSVC's STL performs extensive bounds and validity checking on iterators in Debug builds. This is desirable for correctness, but causes significant slowdowns in the current renderer's implementation due to the amount of vertex copies and transformations it performs. Until we can fully convert the renderer into more of a 'static data' model where the game does not need to ship thousands of vertices to the GPU each frame, the MSVC Debug build realistically needs to skip the per-iteration checks for its vertex iterators. This is, I think, a happy compromise between full checks and just using bare data pointers: The iterators are still used and checked once each at the start of any operation, but the memory accesses are done via pointers which should be very fast.

Related Issues

#543, #545

Checklist

[x] I have tested my changes locally and verified that they work as intended.
[x] I have documented any new or modified functionality.
[x] I have reviewed the changes to ensure they do not introduce any unnecessary complexity or duplicate code.
[x] I understand that by submitting this pull request, I am agreeing to license my contributions under the project's license.

Aug 28 '24 17:08 tophyr

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1820

:page_facing_up: Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

:white_check_mark: No Failures

As of commit 17377442f494ada61844e09351271fc29e292b54 with merge base 7744608c4c455a86b21f4ce0642e6c19bc1cebf4 (): :green_heart: Looks good so far! There are no failures yet. :green_heart:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Oct 12 '24 13:10 pytorch-bot[bot]

@RdoubleA Require review.

Oct 12 '24 13:10 krammnic

I see where is the problem. I committed and then switched branch...

Oct 12 '24 14:10 krammnic

@RdoubleA Should be fine now.

Oct 12 '24 20:10 krammnic

Yeah, some stuff failed( Let me fix then other PRs and then this one.

Oct 13 '24 19:10 krammnic

Done all small fixes + CI fixes. Only need extra test

Oct 14 '24 22:10 krammnic

@RdoubleA @ebsmothers Made all required fixes, manually tested on required dataset:

from torchtune.datasets.multimodal import multimodal_instruct_dataset
import torch
from torch import nn
from torchtune.data import Message, PromptTemplate, truncate
from torchtune.modules.tokenizers import ModelTokenizer
from torchtune.modules.transforms import Transform
from typing import Any, Dict, Generator, List, Mapping, Optional, TextIO, Tuple, Union


class DummyTokenizer(ModelTokenizer, Transform):
    def __init__(self, max_seq_len: Optional[int] = None):
        self.max_seq_len = max_seq_len

    def encode(self, text, add_bos=True, add_eos=True, **kwargs) -> List[int]:
        words = text.split()
        tokens = [len(word) for word in words]
        if add_bos:
            tokens = [self.bos_id] + tokens
        if add_eos:
            tokens = tokens + [self.eos_id]
        return tokens

    def tokenize_messages(
        self,
        messages: List[Message],
    ) -> Tuple[List[int], List[bool]]:
        """
        A simplified version of Llama2Tokenizer's ``tokenize_messages`` for testing purposes.
        """
        start_of_turn = True
        end_of_turn = False
        tokenized_messages = []
        mask = []
        for message in messages:
            # If assistant message, this is the end of a turn
            end_of_turn = message.role == "assistant"

            # Prepend BOS on start of new turns
            if start_of_turn:
                tokenized_messages.append(self.bos_id)
                mask.append(message.masked)

            # Tokenize current message, append with masks
            tokens = []
            for item in message.content:
                if item["type"] == "text":
                    tokens = tokens + self.encode(
                        item["content"],
                        add_bos=False,
                        add_eos=False,
                    )
                elif item["type"] == "image":
                    tokens = tokens + [self.image_id]

            tokenized_messages.extend(tokens)
            mask.extend([message.masked] * len(tokens))

            # If assistant message, append EOS at end
            if end_of_turn:
                tokenized_messages.append(self.eos_id)
                mask.append(message.masked)
                end_of_turn = False
                start_of_turn = True
            else:
                start_of_turn = False

            # Break out early if we reach max_seq_len
            if self.max_seq_len and len(tokenized_messages) >= self.max_seq_len:
                break

        # Finally, truncate if necessary
        if self.max_seq_len:
            tokenized_messages = truncate(
                tokenized_messages, self.max_seq_len, self.eos_id
            )
            mask = truncate(mask, self.max_seq_len, message.masked)

        return tokenized_messages, mask

    def __call__(self, sample: Mapping[str, Any]) -> Mapping[str, Any]:
        messages: List[Message] = sample.pop("messages")
        images = []
        for message in messages:
            images += message.get_media()
        tokens, mask = self.tokenize_messages(messages)
        sample["tokens"] = tokens
        sample["mask"] = mask
        sample["images"] = images
        return sample

    @property
    def eos_id(self):
        return -1

    @property
    def bos_id(self):
        return 0

    @property
    def image_id(self):
        return -2

dataset = multimodal_instruct_dataset(
    model_transform=DummyTokenizer(),
    # source="json",
    source="derek-thomas/ScienceQA",
    # data_files="derek-thomas/ScienceQA",
    column_map={
        "input": "question",
        "output": "solution",
        "image": "image"
    },
    split="train"
)

tokens = dataset[0]
print(tokens)

Output:

{'input': 'question', 'output': 'solution', 'image': 'image'}
{'tokens': [0, -2, 5, 2, 5, 6, 2, 8, 6, 2, 4, 3, 7, 4, 2, 3, 7, 5, 4, 2, 5, 3, 3, 5, 5, 2, 9, 4, 8, 2, 8, 6, -1], 'mask': [True, True, True, True, True, True, True, True, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], 'images': [<PIL.PngImagePlugin.PngImageFile image mode=RGB size=750x429 at 0x7FD20A590D50>], 'labels': [np.int64(-100), np.int64(-100), np.int64(-100), np.int64(-100), np.int64(-100), np.int64(-100), np.int64(-100), np.int64(-100), np.int64(-100), np.int64(2), np.int64(4), np.int64(3), np.int64(7), np.int64(4), np.int64(2), np.int64(3), np.int64(7), np.int64(5), np.int64(4), np.int64(2), np.int64(5), np.int64(3), np.int64(3), np.int64(5), np.int64(5), np.int64(2), np.int64(9), np.int64(4), np.int64(8), np.int64(2), np.int64(8), np.int64(6), np.int64(-1)]}

I'm pretty confident that it is actually working

Oct 14 '24 23:10 krammnic

@ebsmothers @RdoubleA Added another fix, locally tests are passed

Oct 15 '24 19:10 krammnic

Yeah, finnaly. Can someone approve and merge? Or should we do some extra checks?

Oct 15 '24 20:10 krammnic

@ebsmothers Fixed, according to your comments

Oct 16 '24 05:10 krammnic

Can someone start CI and merge then?

Oct 16 '24 14:10 krammnic

hey @krammnic , I've added you on Discord. Let's coordinate further there with @joecummings and we can update this thread after

Oct 16 '24 16:10 RdoubleA

Pushed some fixes

Oct 16 '24 19:10 krammnic

I have a feeling that we will have to separate 2 version of InputToMessages to not break SRP for example

Oct 16 '24 19:10 krammnic

@RdoubleA @joecummings Can we merge?

Oct 16 '24 19:10 krammnic

This looks great to me. Coming at this from an outside perspective I'd really love to see a couple follow up issues to make sure we properly document new features we add to our datasets. I've mentioned above some things I immediately found confusing when combing through the current docs in this PR.

[ ] We now can document usage of a general vqa MM builder in https://pytorch.org/torchtune/main/basics/multimodal_datasets.html, to the same extent which we have documented the MM chat builder. Alternatively, we can rename the current MM docpage to Multimodal chat datasets, and add a new page for MM instruct.
[ ] We need to add a concrete HF builder using this, and add it at the bottom of https://pytorch.org/torchtune/main/basics/multimodal_datasets.html (or a new instruct MM page)
[ ] I'd like to see a fine-tuning run using this dataset.

Oct 17 '24 10:10 SalmanMohammadi

We now can document usage of a general vqa MM builder in https://pytorch.org/torchtune/main/basics/multimodal_datasets.html, to the same extent which we have documented the MM chat builder. Alternatively, we can rename the current MM docpage to Multimodal chat datasets, and add a new page for MM instruct.

Probably the second idea is better.

I'd like to see a fine-tuning run using this dataset.

Then will run full procedure

Should I make this docs fixes in this PR or open separate?

Oct 17 '24 12:10 krammnic

Oh, lint(I accepted changes right here). Let me fix

Oct 17 '24 12:10 krammnic

Then will run full procedure Should I make this docs fixes in this PR or open separate?

We'd be incredibly grateful if you're up for helping out here! I was speaking slightly generally just to make sure some of those tasks get done at some point, so myself (and the other maintainers) are happy to help out, particularly with a training run, and of course, reviews!

In any case, seperate PRs are fine.

Oct 17 '24 12:10 SalmanMohammadi

Fixed

Oct 17 '24 13:10 krammnic

Codecov Report

Attention: Patch coverage is 34.88372% with 28 lines in your changes missing coverage. Please review.

Project coverage is 25.75%. Comparing base (54673b7) to head (3c2134f). Report is 26 commits behind head on main.

Files with missing lines	Patch %	Lines
torchtune/data/_messages.py	0.00%	14 Missing :warning:
.../torchtune/datasets/multimodal/test_vqa_dataset.py	45.00%	11 Missing :warning:
torchtune/datasets/multimodal/_vqa.py	62.50%	3 Missing :warning:

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1820       +/-   ##
===========================================
- Coverage   67.05%   25.75%   -41.30%     
===========================================
  Files         305      307        +2     
  Lines       15937    16068      +131     
===========================================
- Hits        10687     4139     -6548     
- Misses       5250    11929     +6679

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Oct 17 '24 13:10 codecov-commenter

Is there anything else to add in this PR?

Oct 17 '24 14:10 krammnic

Descent3 Descent3 copied to clipboard

Improve render perf for MSVC Debug configuration

Pull Request Type

Description

Related Issues

Checklist

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1820

:white_check_mark: No Failures

Codecov Report

Descent3
Descent3 copied to clipboard