Failed to retrieve docs
I get the above message if I select a topic ("Women in Parliaments") or try to create a new one, then click the lightbulb icon. I'm trying to run the Docker setup as laid out on the hical.github.io page on athome4.
How do I debug this?
Hi @isoboroff
I wasn't able to recreate the problem on my machine. Can you run docker-compose -f HiCAL.yml logs -t -f --tail 100 django and try again? This will show if there is any errors happening
Hello,
I have the same problem when trying to use my own document collection. I have tried using the same filenames and also the same structure as athome4 sample_dataset (see example1 file) On another try I created my own structure and a respective functions.py (see functions_test.py and example2) Both result in an error when I create a new topic and click on the lightbulb icon: Failed to retrieve docs.. docno: {"message": "Error occurred. Please inform study coordinators"} The query word I use as a seed query is on the documents and the command suggested above does not generate any errors :/ Any idea on what might be the problem?
Thank you!
Hi @Isminoula
Can you run docker-compose -f HiCAL.yml logs -t -f --tail 100 django, this will display log messages from django and it would help clarify the problem,
Hello @ammsa thank you for the quick response! Here is the output of the command hical_log.log
Hey @Isminoula, Thanks for the logs. There seems to be a restriction in the code which assumes that paragraphs in the tgz are ordered by their parent document ids. I am working on removing these restrictions. Meanwhile, can you reach out to me on [email protected] and I can help you get things working?
@nims11 @ammsa Hi! Has there been progress on removing the paragraph ordering restriction? I am getting the same error in the logs (Paragraphs must be in increasing order of their parent document ids) when I try to use my own documents. I even tried creating a smaller .tgz archive where I sorted the paragraph ids (I removed any ids > 9 for simplicity), but am getting the same error.
Here's the sample sorted archive:
>>> tar -tvf test_processed_para.tgz
test_processed_para/2649b0a0.0
test_processed_para/2649b0a0.1
test_processed_para/2649b0a0.2
test_processed_para/2649b0a0.3
test_processed_para/2649b0a0.4
test_processed_para/2649b0a0.5
test_processed_para/2649b0a0.6
test_processed_para/2649b0a0.7
test_processed_para/9c4d2967.0
test_processed_para/9c4d2967.1
test_processed_para/9c4d2967.2
test_processed_para/9c4d2967.3
test_processed_para/9c4d2967.4
test_processed_para/9c4d2967.5
test_processed_para/9c4d2967.6
test_processed_para/9c4d2967.7
test_processed_para/9c4d2967.8
test_processed_para/9c4d2967.9
test_processed_para/a5f47849.0
test_processed_para/a5f47849.1
test_processed_para/a5f47849.2
test_processed_para/a5f47849.3
test_processed_para/a5f47849.4
test_processed_para/a5f47849.5
test_processed_para/a5f47849.6
test_processed_para/bb22ef8e.0
test_processed_para/bb22ef8e.1
test_processed_para/bb22ef8e.2
test_processed_para/bb22ef8e.3
test_processed_para/bb22ef8e.4
test_processed_para/bb22ef8e.5
test_processed_para/bb22ef8e.6
If you could let me know how to fix or work around this issue that would be greatly appreciated! Thanks for your help!
Hi everyone,
Sorry for the late reply, it is close to PhD defense time so I completely forgot this issue due to an overwhelming schedule. Although I am not 100% sure that this is the correct way to go about his, I did bypass the problem by deleting the if statement in these lines...
@Isminoula While it will remove the error, it will sometimes cause issues when rescoring items. There is an efficiency logic which uses binary search to move around that order.
@Isminoula @dianalam I have pushed a fix in a branch (https://github.com/hical/HiCAL/tree/fix-para-ordering). I will run some further tests before merging to master but it will be helpful if one of you could also try that branch out.
@nims11 Thanks for the quick response and the fix! I tested it on my dataset and it worked.