Bijay Gurung
Bijay Gurung
Hi @tiwari93 This is a bit tricky. The models that can handle aggregation don't return no-answer and the confidence scores are also not reliable (else could have filtered based on...
Hi @chintanshrinath , Just to make sure I understand the problem: it's that upload for some files are (silently) failing, right? Generally how many files are you uploading and how...
Hi @chintanshrinath Could you try the csv batch upload? Also, if the files can be shared, could you share a link to one of the files?
@thisum Did it work eventually?
@masci Seems like it still runs the unnecessary component (here FilterRetriever for the SemanticReader branch). [Setup on Colab](https://colab.research.google.com/drive/1u9X0Dv-2PO0fTqGjCKbTu34e-XXoMS_b?usp=sharing) afaict, components with no inputs are immediately added into `to_run` in the...
So to take stock: **Does training EmbeddingRetrievers make sense?** Yes, definitely helps if labeled data is available. **Which sentence-transformer model(s) do we suggest for out-of-the-box use?** Now it makes sense...
Hi, So based on discussions above, am pivoting to adding `MultipleNegativesRankingLoss` support to the training of EmbeddingRetriever. Opened an issue for it here: deepset-ai/haystack#3136 Can get back to this Tutorial...
Hi @sinchanabhat, Ya, the tutorial is coming soon-ish. Can't commit to a time frame but a median estimate could be end of next week. 😅 In the meantime, you can...
> doesn't the train/fine-tune involve early stopping or taking the best model as the model with best validation metric ? Or is it just running for 5 to 10 epochs...
Hey @vibha0411 , If you use MultipleNegativesRankingLoss (`train_loss='mnrl'`, currently the default), the scores aren't required. [1] In fact, for MNRL even having the negative docs is optional because it considers...