Christopher Schröder comments

Results 52 comments of


                                            Christopher Schröder

Additional options for Discriminative Active Learning

Sorry for the long wait. I have been busy and so have my GPU resources. The implementation is now almost done except for some final tests and clean up. I...

When using EmbeddingBasedQueryStrategy with some transformers, model has an unsupported input `token_type_ids` when creating embeddings.

Yes, such errors may happen, as models can have arbitrary arguments. What you suggest here sounds like a good solution when the calling side passes more parameters than the models...

Device-side assertion not passed when training on cuda device and when there are added tokens to the tokenizer

Thanks for reporting this! I will look into it.

Device-side assertion not passed when training on cuda device and when there are added tokens to the tokenizer

@RaymondUoE With just the additional `tokenizer.add_special_tokens()` call, I cannot reproduce the error. Can you provide details on the assertion output?

Choosing the datapoints that need to be annotated?

Thanks for the ping, Tom! Depends on the problem. Do you know how many classes your dataset will have @vahuja4? How many samples does your dataset contain? Without having that...

Choosing the datapoints that need to be annotated?

> @kgourgou - thank you! I will give it a shot. @chschroeder - thank you for your reply! The number of classes is 74 and the size of the corpus...

Choosing the datapoints that need to be annotated?

> small-text looks amazing and fits a use-case I have! > > @vahuja4 I expect that @chschroeder can give more precise answers than me, but in general it depends on...

Choosing the datapoints that need to be annotated?

Thank you, @MosheWasserb! I am honored you already noticed. SetFit has been working great for me, thanks for that as well. Such a metric seems reasonable but then how do...

Choosing the datapoints that need to be annotated?

> @chschroeder Is there are a few simple rules of thumb to choose the best strategy for a given data set, or should I try all of them? For example,...

make_wikipedia.py: long running time

Thank you for the tip! In the meanwhile I have discovered a pre-parsed dataset on huggingface hub: [wikimedia/wikipedia](https://huggingface.co/datasets/wikimedia/wikipedia). They also seem to use this parser, so I will try using...