baybe Add benchmarks

EDIT:

Added 2 chemical data TL benchmarks and 3 synthetic ones.
Lines:
- For chemical ones and Hartmann: TL model with 0-10% of data (e.g. 0) and one with non-TL model using only task data (nonTL).
- For Michaelewicz and Easom: TL model with single source data size(TL), TL with no source data (TL-noSource), and non-TL model with only target data (non-TL).

OLD: Started by adding direct arylation TL campaign with Temp as task - adapted from the paper.

Will continue adding more. If someone could already check if this goes in the right direction that would be great so that I do not repeat the mistakes for the others.

Things to discuss:

[x] Where to read data from?
[x] How should TL benchmarks be set up? E.g. should we always include comparison to no TL and different source data proportions? - We didnt do this before on some datasets as we mainly focused on comparing different TL methods.

Feb 19 '25 15:02 Hrovatin

@Hrovatin if there are discussion items before we start the review, can you open one thread per item so we can collect thoughts? if needed we can also have meetings

Feb 20 '25 16:02 Scienfitz

Still TODO:

[x] Update and unify descriptions

Apr 08 '25 11:04 AVHopp