haystack
haystack copied to clipboard
Annotation tool upload issue
Describe the bug When I upload txt file in annotation tool, Let us say, I am trying to upload around 25 text file, out of them some random number files are showing in the document.
Error message No error is showing
Expected behavior All 25 txt files needs to show
Additional context Add any other context about the problem here, like document types / preprocessing steps / settings of reader etc.
To Reproduce Go to Annotation tool , then in the import section select documents. In next step, in text upload sections, just upload the 25 txt files. After that go to documents section, where you can't see all the txt file
FAQ Check
- [ ] Have you had a look at our new FAQ page?
System:
- OS: Windows
- GPU/CPU: CPU
Hi @chintanshrinath ,
Just to make sure I understand the problem: it's that upload for some files are (silently) failing, right?
Generally how many files are you uploading and how many are making through (i.e. show up on Documents tab)? If you retry, do the same files end up being successfully uploaded?
Hi @bglearning
Yes, So I am trying to upload 25 files, 17 are getting uploaded.
If I do second time, same things happend, again 17 files are uploaded.
Hi @chintanshrinath Could you try the csv batch upload?
Also, if the files can be shared, could you share a link to one of the files?
Hi @chintanshrinath I tried to reproduce the issue you reported and uploaded 25 txt files. However, I didn't face any problems and all 25 files got uploaded successfully. Therefore, I assume that it has to do with the content or names of your text files. Could it be that some of the files have the exact same content? In that case they could get filtered out as duplicates. Or could it be that some of the files contain special characters, uncommon encoding or are completely empty?
Hi @julian-risch Thanks for reaching out.
Yes, as you suggest, I converted txt in proper format, now,I am able to upload the files.
Earlier it was longer name of the file and text file was not utf-8 format.
Thanks for your support.
We may close this issue.
Thanks