HiCAL Failed to retrieve docs

I get the above message if I select a topic ("Women in Parliaments") or try to create a new one, then click the lightbulb icon. I'm trying to run the Docker setup as laid out on the hical.github.io page on athome4.

How do I debug this?

May 06 '19 17:05 isoboroff

Hi @isoboroff

I wasn't able to recreate the problem on my machine. Can you run docker-compose -f HiCAL.yml logs -t -f --tail 100 django and try again? This will show if there is any errors happening

May 09 '19 18:05 ammsa

Hello,

I have the same problem when trying to use my own document collection. I have tried using the same filenames and also the same structure as athome4 sample_dataset (see example1 file) On another try I created my own structure and a respective functions.py (see functions_test.py and example2) Both result in an error when I create a new topic and click on the lightbulb icon: Failed to retrieve docs.. docno: {"message": "Error occurred. Please inform study coordinators"} The query word I use as a seed query is on the documents and the command suggested above does not generate any errors :/ Any idea on what might be the problem?

Thank you!

example.zip

Sep 19 '19 20:09 Isminoula

Hi @Isminoula Can you run docker-compose -f HiCAL.yml logs -t -f --tail 100 django, this will display log messages from django and it would help clarify the problem,

Sep 21 '19 15:09 ammsa

Hello @ammsa thank you for the quick response! Here is the output of the command hical_log.log

Sep 23 '19 19:09 Isminoula

Hey @Isminoula, Thanks for the logs. There seems to be a restriction in the code which assumes that paragraphs in the tgz are ordered by their parent document ids. I am working on removing these restrictions. Meanwhile, can you reach out to me on [email protected] and I can help you get things working?

Oct 08 '19 13:10 nims11

@nims11 @ammsa Hi! Has there been progress on removing the paragraph ordering restriction? I am getting the same error in the logs (Paragraphs must be in increasing order of their parent document ids) when I try to use my own documents. I even tried creating a smaller .tgz archive where I sorted the paragraph ids (I removed any ids > 9 for simplicity), but am getting the same error.

Here's the sample sorted archive:

>>> tar -tvf test_processed_para.tgz
test_processed_para/2649b0a0.0
test_processed_para/2649b0a0.1
test_processed_para/2649b0a0.2
test_processed_para/2649b0a0.3
test_processed_para/2649b0a0.4
test_processed_para/2649b0a0.5
test_processed_para/2649b0a0.6
test_processed_para/2649b0a0.7
test_processed_para/9c4d2967.0
test_processed_para/9c4d2967.1
test_processed_para/9c4d2967.2
test_processed_para/9c4d2967.3
test_processed_para/9c4d2967.4
test_processed_para/9c4d2967.5
test_processed_para/9c4d2967.6
test_processed_para/9c4d2967.7
test_processed_para/9c4d2967.8
test_processed_para/9c4d2967.9
test_processed_para/a5f47849.0
test_processed_para/a5f47849.1
test_processed_para/a5f47849.2
test_processed_para/a5f47849.3
test_processed_para/a5f47849.4
test_processed_para/a5f47849.5
test_processed_para/a5f47849.6
test_processed_para/bb22ef8e.0
test_processed_para/bb22ef8e.1
test_processed_para/bb22ef8e.2
test_processed_para/bb22ef8e.3
test_processed_para/bb22ef8e.4
test_processed_para/bb22ef8e.5
test_processed_para/bb22ef8e.6

If you could let me know how to fix or work around this issue that would be greatly appreciated! Thanks for your help!

Oct 30 '19 21:10 dianalam

Hi everyone,

Sorry for the late reply, it is close to PhD defense time so I completely forgot this issue due to an overwhelming schedule. Although I am not 100% sure that this is the correct way to go about his, I did bypass the problem by deleting the if statement in these lines...

Oct 30 '19 21:10 Isminoula

@Isminoula While it will remove the error, it will sometimes cause issues when rescoring items. There is an efficiency logic which uses binary search to move around that order.

@Isminoula @dianalam I have pushed a fix in a branch (https://github.com/hical/HiCAL/tree/fix-para-ordering). I will run some further tests before merging to master but it will be helpful if one of you could also try that branch out.

Oct 30 '19 23:10 nims11

@nims11 Thanks for the quick response and the fix! I tested it on my dataset and it worked.

Nov 07 '19 21:11 dianalam