Descent3
Descent3 copied to clipboard
Improve render perf for MSVC Debug configuration
Pull Request Type
- [ ] GitHub Workflow changes
- [ ] Documentation or Wiki changes
- [ ] Build and Dependency changes
- [x] Runtime changes
- [x] Render changes
- [ ] Audio changes
- [ ] Input changes
- [ ] Network changes
- [ ] Other changes
Description
MSVC's STL performs extensive bounds and validity checking on iterators in Debug builds. This is desirable for correctness, but causes significant slowdowns in the current renderer's implementation due to the amount of vertex copies and transformations it performs. Until we can fully convert the renderer into more of a 'static data' model where the game does not need to ship thousands of vertices to the GPU each frame, the MSVC Debug build realistically needs to skip the per-iteration checks for its vertex iterators. This is, I think, a happy compromise between full checks and just using bare data pointers: The iterators are still used and checked once each at the start of any operation, but the memory accesses are done via pointers which should be very fast.
Related Issues
#543, #545
Checklist
- [x] I have tested my changes locally and verified that they work as intended.
- [x] I have documented any new or modified functionality.
- [x] I have reviewed the changes to ensure they do not introduce any unnecessary complexity or duplicate code.
- [x] I understand that by submitting this pull request, I am agreeing to license my contributions under the project's license.
:link: Helpful Links
:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1820
- :page_facing_up: Preview Python docs built from this PR
Note: Links to docs will display an error until the docs builds have been completed.
:white_check_mark: No Failures
As of commit 17377442f494ada61844e09351271fc29e292b54 with merge base 7744608c4c455a86b21f4ce0642e6c19bc1cebf4 ():
:green_heart: Looks good so far! There are no failures yet. :green_heart:
This comment was automatically generated by Dr. CI and updates every 15 minutes.
@RdoubleA Require review.
I see where is the problem. I committed and then switched branch...
@RdoubleA Should be fine now.
Yeah, some stuff failed( Let me fix then other PRs and then this one.
Done all small fixes + CI fixes. Only need extra test
@RdoubleA @ebsmothers Made all required fixes, manually tested on required dataset:
from torchtune.datasets.multimodal import multimodal_instruct_dataset
import torch
from torch import nn
from torchtune.data import Message, PromptTemplate, truncate
from torchtune.modules.tokenizers import ModelTokenizer
from torchtune.modules.transforms import Transform
from typing import Any, Dict, Generator, List, Mapping, Optional, TextIO, Tuple, Union
class DummyTokenizer(ModelTokenizer, Transform):
def __init__(self, max_seq_len: Optional[int] = None):
self.max_seq_len = max_seq_len
def encode(self, text, add_bos=True, add_eos=True, **kwargs) -> List[int]:
words = text.split()
tokens = [len(word) for word in words]
if add_bos:
tokens = [self.bos_id] + tokens
if add_eos:
tokens = tokens + [self.eos_id]
return tokens
def tokenize_messages(
self,
messages: List[Message],
) -> Tuple[List[int], List[bool]]:
"""
A simplified version of Llama2Tokenizer's ``tokenize_messages`` for testing purposes.
"""
start_of_turn = True
end_of_turn = False
tokenized_messages = []
mask = []
for message in messages:
# If assistant message, this is the end of a turn
end_of_turn = message.role == "assistant"
# Prepend BOS on start of new turns
if start_of_turn:
tokenized_messages.append(self.bos_id)
mask.append(message.masked)
# Tokenize current message, append with masks
tokens = []
for item in message.content:
if item["type"] == "text":
tokens = tokens + self.encode(
item["content"],
add_bos=False,
add_eos=False,
)
elif item["type"] == "image":
tokens = tokens + [self.image_id]
tokenized_messages.extend(tokens)
mask.extend([message.masked] * len(tokens))
# If assistant message, append EOS at end
if end_of_turn:
tokenized_messages.append(self.eos_id)
mask.append(message.masked)
end_of_turn = False
start_of_turn = True
else:
start_of_turn = False
# Break out early if we reach max_seq_len
if self.max_seq_len and len(tokenized_messages) >= self.max_seq_len:
break
# Finally, truncate if necessary
if self.max_seq_len:
tokenized_messages = truncate(
tokenized_messages, self.max_seq_len, self.eos_id
)
mask = truncate(mask, self.max_seq_len, message.masked)
return tokenized_messages, mask
def __call__(self, sample: Mapping[str, Any]) -> Mapping[str, Any]:
messages: List[Message] = sample.pop("messages")
images = []
for message in messages:
images += message.get_media()
tokens, mask = self.tokenize_messages(messages)
sample["tokens"] = tokens
sample["mask"] = mask
sample["images"] = images
return sample
@property
def eos_id(self):
return -1
@property
def bos_id(self):
return 0
@property
def image_id(self):
return -2
dataset = multimodal_instruct_dataset(
model_transform=DummyTokenizer(),
# source="json",
source="derek-thomas/ScienceQA",
# data_files="derek-thomas/ScienceQA",
column_map={
"input": "question",
"output": "solution",
"image": "image"
},
split="train"
)
tokens = dataset[0]
print(tokens)
Output:
{'input': 'question', 'output': 'solution', 'image': 'image'}
{'tokens': [0, -2, 5, 2, 5, 6, 2, 8, 6, 2, 4, 3, 7, 4, 2, 3, 7, 5, 4, 2, 5, 3, 3, 5, 5, 2, 9, 4, 8, 2, 8, 6, -1], 'mask': [True, True, True, True, True, True, True, True, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], 'images': [<PIL.PngImagePlugin.PngImageFile image mode=RGB size=750x429 at 0x7FD20A590D50>], 'labels': [np.int64(-100), np.int64(-100), np.int64(-100), np.int64(-100), np.int64(-100), np.int64(-100), np.int64(-100), np.int64(-100), np.int64(-100), np.int64(2), np.int64(4), np.int64(3), np.int64(7), np.int64(4), np.int64(2), np.int64(3), np.int64(7), np.int64(5), np.int64(4), np.int64(2), np.int64(5), np.int64(3), np.int64(3), np.int64(5), np.int64(5), np.int64(2), np.int64(9), np.int64(4), np.int64(8), np.int64(2), np.int64(8), np.int64(6), np.int64(-1)]}
I'm pretty confident that it is actually working
@ebsmothers @RdoubleA Added another fix, locally tests are passed
Yeah, finnaly. Can someone approve and merge? Or should we do some extra checks?
@ebsmothers Fixed, according to your comments
Can someone start CI and merge then?
hey @krammnic , I've added you on Discord. Let's coordinate further there with @joecummings and we can update this thread after
Pushed some fixes
I have a feeling that we will have to separate 2 version of InputToMessages to not break SRP for example
@RdoubleA @joecummings Can we merge?
This looks great to me. Coming at this from an outside perspective I'd really love to see a couple follow up issues to make sure we properly document new features we add to our datasets. I've mentioned above some things I immediately found confusing when combing through the current docs in this PR.
- [ ] We now can document usage of a general vqa MM builder in https://pytorch.org/torchtune/main/basics/multimodal_datasets.html, to the same extent which we have documented the MM chat builder. Alternatively, we can rename the current MM docpage to
Multimodal chat datasets, and add a new page for MM instruct. - [ ] We need to add a concrete HF builder using this, and add it at the bottom of https://pytorch.org/torchtune/main/basics/multimodal_datasets.html (or a new instruct MM page)
- [ ] I'd like to see a fine-tuning run using this dataset.
We now can document usage of a general vqa MM builder in https://pytorch.org/torchtune/main/basics/multimodal_datasets.html, to the same extent which we have documented the MM chat builder. Alternatively, we can rename the current MM docpage to
Multimodal chat datasets, and add a new page for MM instruct.
Probably the second idea is better.
I'd like to see a fine-tuning run using this dataset.
Then will run full procedure
Should I make this docs fixes in this PR or open separate?
Oh, lint(I accepted changes right here). Let me fix
Then will run full procedure Should I make this docs fixes in this PR or open separate?
We'd be incredibly grateful if you're up for helping out here! I was speaking slightly generally just to make sure some of those tasks get done at some point, so myself (and the other maintainers) are happy to help out, particularly with a training run, and of course, reviews!
In any case, seperate PRs are fine.
Fixed
Codecov Report
Attention: Patch coverage is 34.88372% with 28 lines in your changes missing coverage. Please review.
Project coverage is 25.75%. Comparing base (
54673b7) to head (3c2134f). Report is 26 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #1820 +/- ##
===========================================
- Coverage 67.05% 25.75% -41.30%
===========================================
Files 305 307 +2
Lines 15937 16068 +131
===========================================
- Hits 10687 4139 -6548
- Misses 5250 11929 +6679
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Is there anything else to add in this PR?