Tobias Pitters

Results 35 issues of Tobias Pitters

- [x] closes #47677 - [x] [Tests added and passed](https://pandas.pydata.org/pandas-docs/dev/development/contributing_codebase.html#writing-tests) if fixing a bug or adding a new feature - [x] All [code checks passed](https://pandas.pydata.org/pandas-docs/dev/development/contributing_codebase.html#pre-commit). - [x] Added [type annotations](https://pandas.pydata.org/pandas-docs/dev/development/contributing_codebase.html#type-hints)...

Indexing
Categorical

- [x] closes #47449 - [x] [Tests added and passed](https://pandas.pydata.org/pandas-docs/dev/development/contributing_codebase.html#writing-tests) if fixing a bug or adding a new feature - [x] All [code checks passed](https://pandas.pydata.org/pandas-docs/dev/development/contributing_codebase.html#pre-commit). - [x] Added [type annotations](https://pandas.pydata.org/pandas-docs/dev/development/contributing_codebase.html#type-hints)...

expressions

I am creating schemas dynamically (from ddl schemas). It would be great if I could build some test and just do ```python expected_schema == result_schema ``` Right now I am...

enhancement
help wanted

Replace the deprecated `inspect.getargspec` with `inspect.getfullargspec` as the first is deprecated in favor of the latter. See here: https://docs.python.org/3/library/inspect.html#inspect.getargspec

The inputs can be quite a lot of different versions of `no input`, therefore don't use the `input` column for that. In some cases the text in `input` is already...

ml

We currently support reverse augmentation for the alpaca datasets. This proved to be not really helpful till now. As mentioned in section 5.1.1 of the [paper](https://arxiv.org/abs/2303.18223) we should probably generate...

ml
good first issue

closes #2708 Add pydantic basemodel class (equivalent to dataclass but with stronger guarantees) to return from dolly dataset. Add the formatting functionality in the dataset entry class. This PR does...

Add dialogue data collator unit test. Things to note on this PR: - is it correct that we mask the last occurance of `` of the assistant? See the example...

It seems like the score for the month and the total score are synchronized in different time intervals. Maybe we could do both aggregations in the same time interval. ![image](https://user-images.githubusercontent.com/31857876/231005321-1e5f61f5-93f4-4d93-9e64-d485dc907b98.png)

backend
nice-to-have

As it can be seen in [some of the answers](https://open-assistant.github.io/oasst-model-eval/?f=https%3A%2F%2Fraw.githubusercontent.com%2FOpen-Assistant%2Foasst-model-eval%2Fmain%2Fsampling_reports%2Foasst-sft%2F2023-04-07_OpenAssistant_llama-30b-sft-oa-alpaca-epoch-2_sampling_noprefix2.json%0Ahttps%3A%2F%2Fraw.githubusercontent.com%2FOpen-Assistant%2Foasst-model-eval%2Fmain%2Fsampling_reports%2Foasst-sft%2F2023-04-07_OpenAssistant_llama-30b-sft-oa-alpaca-epoch-4_sampling_noprefix2.json) our model outputs quite a number of tokens that are reserved for special purposes and should not appear in text....

data