bern icon indicating copy to clipboard operation
bern copied to clipboard

NER extraction of text seems not to be working

Open amalic opened this issue 4 years ago • 9 comments

Sample program, based on your README.MD

import requests
import json
body_data = {"param": json.dumps({"text": "CLAPO syndrome: identification of somatic activating PIK3CA mutations and delineation of the natural history and phenotype. PURPOSE: CLAPO syndrome is a rare vascular disorder characterized by capillary malformation of the lower lip, lymphatic malformation predominant on the face and neck, asymmetry, and partial/generalized overgrowth. Here we tested the hypothesis that, although the genetic cause is not known, the tissue distribution of the clinical manifestations in CLAPO seems to follow a pattern of somatic mosaicism. METHODS: We clinically evaluated a cohort of 13 patients with CLAPO and screened 20 DNA blood/tissue samples from 9 patients using high-throughput, deep sequencing. RESULTS: We identified five activating mutations in the PIK3CA gene in affected tissues from 6 of the 9 patients studied; one of the variants (NM_006218.2:c.248T>C; p.Phe83Ser) has not been previously described in developmental disorders. CONCLUSION: We describe for the first time the presence of somatic activating PIK3CA mutations in patients with CLAPO. We also report an update of the phenotype and natural history of the syndrome."})}
response = requests.post('http://localhost/', data=body_data)
print(response)
print("content: ", response.content)
result_dict = response.json()
print(result_dict)

Output

<Response [200]>
content:  b''
Traceback (most recent call last):
  File "test.py", line 7, in <module>
    result_dict = response.json()
  File "/home/alex/.local/lib/python3.6/site-packages/requests/models.py", line 898, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/lib/python3/dist-packages/simplejson/__init__.py", line 518, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3/dist-packages/simplejson/decoder.py", line 370, in decode
    obj, end = self.raw_decode(s)
  File "/usr/lib/python3/dist-packages/simplejson/decoder.py", line 400, in raw_decode
    return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

A curl example would be highly appreciated.

amalic avatar Apr 23 '20 05:04 amalic

Are you using port number 80? If not, add the port number you set after "localhost" and a colon ":".

donghyeonk avatar Apr 23 '20 14:04 donghyeonk

I run into the same issue. The server issues the following error:

89.212.10xx - - [23/Apr/2020 18:42:08] "POST / HTTP/1.1" 200 - [23/Apr/2020 18:42:08.609364] [Thread-95] text_hash: 3da0b63ecd8efcc2a76bbe02df6fc42b9003c94c0662a9223396c2f7 /3da0b63ecd8efcc2a76bbe02df6fc42b9003c94c0662a9223396c2f7.PubTator - (PubTator format) : Processing Time:0.239sec [23/Apr/2020 18:42:08.850149] [Thread-95] GNormPlus 0.240 sec input/3da0b63ecd8efcc2a76bbe02df6fc42b9003c94c0662a9223396c2f7.PubTator - (PubTator format) : Processing Time:0.161sec [23/Apr/2020 18:42:09.012860] [Thread-95] tmVar 2.0 0.162 sec

Exception happened during processing of request from ('89.212.10.xx', 54130) Traceback (most recent call last): File "/usr/lib/python3.5/socketserver.py", line 625, in process_request_thread self.finish_request(request, client_address) File "/usr/lib/python3.5/socketserver.py", line 354, in finish_request self.RequestHandlerClass(request, client_address, self) File "/usr/lib/python3.5/socketserver.py", line 681, in init self.handle() File "/usr/lib/python3.5/http/server.py", line 422, in handle self.handle_one_request() File "/usr/lib/python3.5/http/server.py", line 410, in handle_one_request method() File "server.py", line 317, in do_POST text, cur_thread_name, is_raw_text=True, reuse=False)

File "server.py", line 452, in tag_entities self.biobert_recognize(dict_list, is_raw_text, cur_thread_name) File "server.py", line 490, in biobert_recognize thread_id=cur_thread_name) File "/app/biobert_ner/utils.py", line 15, in with_profiling ret = fn(*args, **kwargs) File "/app/biobert_ner/run_ner.py", line 488, in recognize with open(token_path, 'r') as reader: FileNotFoundError: [Errno 2] No such file or directory: 'biobert_ner/tmp/token_test_Thread-95.txt'

tomasonjo avatar Apr 23 '20 18:04 tomasonjo

Are you using port number 80? If not, add the port number you set after "localhost" and a colon ":".

Yes I am using port 80. I am running Bern in a Docker container.

see: https://github.com/amalic/bern-docker

amalic avatar Apr 24 '20 17:04 amalic

I tried it without the docker as well, and the error persists... after digging a bit I found out that the output folder contains valid JSON results stored like:

bern_demo_095b8bb35ae644040374c488a9ca7c7b5ec56dc66fb577ff227c01e5.json

The problem is just that it does not return this JSON unfortunately

tomasonjo avatar Apr 25 '20 19:04 tomasonjo

Just for clarification. The call for PubMed-IDs works, except for newer PMIDs. This means that I don't have any issues with the port.

The example from your readme file for recognizing entities from text does not work. Can't you reproduce the issue?

amalic avatar May 03 '20 04:05 amalic

Receiving empty response with same body_data which you used to hit bern server.

Following is the body_data:

body_data = {"param": json.dumps({"text": "CLAPO syndrome: identification of somatic activating PIK3CA mutations and delineation of the natural history and phenotype. PURPOSE: CLAPO syndrome is a rare vascular disorder characterized by capillary malformation of the lower lip, lymphatic malformation predominant on the face and neck, asymmetry, and partial/generalized overgrowth. Here we tested the hypothesis that, although the genetic cause is not known, the tissue distribution of the clinical manifestations in CLAPO seems to follow a pattern of somatic mosaicism. METHODS: We clinically evaluated a cohort of 13 patients with CLAPO and screened 20 DNA blood/tissue samples from 9 patients using high-throughput, deep sequencing. RESULTS: We identified five activating mutations in the PIK3CA gene in affected tissues from 6 of the 9 patients studied; one of the variants (NM_006218.2:c.248T>C; p.Phe83Ser) has not been previously described in developmental disorders. CONCLUSION: We describe for the first time the presence of somatic activating PIK3CA mutations in patients with CLAPO. We also report an update of the phenotype and natural history of the syndrome."})}

Bern Server running IP: [04/May/2020 09:55:28.162347] Starting server at http://0.0.0.0:8888

After running in python:

response = requests.post('http://0.0.0.0:8888', data=body_data) response.text

Result: ''

No json response received after hitting it.

Meanwhile I tried by passing wrong body_data to test if it return any error or not. Following are the errors I'm able to receive:

{"error": "empty text"} {"error": "only whitespace letters"} {"error": "no param"} etc.

Please help in receiving json response for valid request, which is empty in my case.

zahidmughal avatar May 04 '20 10:05 zahidmughal

Receiving empty response with same body_data which you used to hit bern server.

Following is the body_data:

body_data = {"param": json.dumps({"text": "CLAPO syndrome: identification of somatic activating PIK3CA mutations and delineation of the natural history and phenotype. PURPOSE: CLAPO syndrome is a rare vascular disorder characterized by capillary malformation of the lower lip, lymphatic malformation predominant on the face and neck, asymmetry, and partial/generalized overgrowth. Here we tested the hypothesis that, although the genetic cause is not known, the tissue distribution of the clinical manifestations in CLAPO seems to follow a pattern of somatic mosaicism. METHODS: We clinically evaluated a cohort of 13 patients with CLAPO and screened 20 DNA blood/tissue samples from 9 patients using high-throughput, deep sequencing. RESULTS: We identified five activating mutations in the PIK3CA gene in affected tissues from 6 of the 9 patients studied; one of the variants (NM_006218.2:c.248T>C; p.Phe83Ser) has not been previously described in developmental disorders. CONCLUSION: We describe for the first time the presence of somatic activating PIK3CA mutations in patients with CLAPO. We also report an update of the phenotype and natural history of the syndrome."})}

Bern Server running IP: [04/May/2020 09:55:28.162347] Starting server at http://0.0.0.0:8888

After running in python:

response = requests.post('http://0.0.0.0:8888', data=body_data) response.text

Result: ''

No json response received after hitting it.

Meanwhile I tried by passing wrong body_data to test if it return any error or not. Following are the errors I'm able to receive:

{"error": "empty text"} {"error": "only whitespace letters"} {"error": "no param"} etc.

Please help in receiving json response for valid request, which is empty in my case.


127.0.0.1 - - [04/May/2020 15:34:31] "POST / HTTP/1.1" 200 - [04/May/2020 15:34:31.333152] [Thread-24] text_hash: 3da0b63ecd8efcc2a76bbe02df6fc42b9003c94c0662a9223396c2f7 [04/May/2020 15:34:31.333375] [Thread-24] GNormPlus 0.000 sec

Exception happened during processing of request from ('127.0.0.1', 42856) Traceback (most recent call last): File "/anaconda/envs/py37_default/lib/python3.7/shutil.py", line 566, in move os.rename(src, real_dst) FileNotFoundError: [Errno 2] No such file or directory: '/home/vm-admin/bern/GNormPlusJava/output/3da0b63ecd8efcc2a76bbe02df6fc42b9003c94c0662a9223396c2f7.PubTator' -> '/home/vm-admin/bern/tmVarJava/input/3da0b63ecd8efcc2a76bbe02df6fc42b9003c94c0662a9223396c2f7.PubTator'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/anaconda/envs/py37_default/lib/python3.7/socketserver.py", line 650, in process_request_thread self.finish_request(request, client_address) File "/anaconda/envs/py37_default/lib/python3.7/socketserver.py", line 360, in finish_request self.RequestHandlerClass(request, client_address, self) File "/anaconda/envs/py37_default/lib/python3.7/socketserver.py", line 720, in init self.handle() File "/anaconda/envs/py37_default/lib/python3.7/http/server.py", line 426, in handle self.handle_one_request() File "/anaconda/envs/py37_default/lib/python3.7/http/server.py", line 414, in handle_one_request method() File "server.py", line 317, in do_POST text, cur_thread_name, is_raw_text=True, reuse=False) File "server.py", line 423, in tag_entities shutil.move(output_gnormplus, input_tmvar2) File "/anaconda/envs/py37_default/lib/python3.7/shutil.py", line 580, in move copy_function(src, real_dst) File "/anaconda/envs/py37_default/lib/python3.7/shutil.py", line 266, in copy2 copyfile(src, dst, follow_symlinks=follow_symlinks) File "/anaconda/envs/py37_default/lib/python3.7/shutil.py", line 120, in copyfile with open(src, 'rb') as fsrc: FileNotFoundError: [Errno 2] No such file or directory: '/home/vm-admin/bern/GNormPlusJava/output/3da0b63ecd8efcc2a76bbe02df6fc42b9003c94c0662a9223396c2f7.PubTator'

I'm receiving following error in logs.

zahidmughal avatar May 04 '20 15:05 zahidmughal

If I call POST API 2 times with the same text it fails with below error. Seems this is linked to the deletion of tmp file from GNormPlusJava. But if I set "DeleteTmp = False" in the "setup.txt" of GNormPlusJava and restart the service with the setup.txt it doen't solve the issue and tmp files are still deleted.

nohup_BERT.out :

Traceback (most recent call last): File "/usr/lib/python3.5/socketserver.py", line 625, in process_request_thread self.finish_request(request, client_address) File "/usr/lib/python3.5/socketserver.py", line 354, in finish_request self.RequestHandlerClass(request, client_address, self) File "/usr/lib/python3.5/socketserver.py", line 681, in init self.handle() File "/usr/lib/python3.5/http/server.py", line 422, in handle self.handle_one_request() File "/usr/lib/python3.5/http/server.py", line 410, in handle_one_request method() File "server.py", line 317, in do_POST text, cur_thread_name, is_raw_text=True, reuse=False) File "server.py", line 423, in tag_entities shutil.move(output_gnormplus, input_tmvar2) File "/usr/lib/python3.5/shutil.py", line 552, in move copy_function(src, real_dst) File "/usr/lib/python3.5/shutil.py", line 251, in copy2 copyfile(src, dst, follow_symlinks=follow_symlinks) File "/usr/lib/python3.5/shutil.py", line 114, in copyfile with open(src, 'rb') as fsrc: FileNotFoundError: [Errno 2] No such file or directory: '/root/bern/GNormPlusJava/output/3da0b63ecd8efcc2a76bbe02df6fc42b9003c94c0662a9223396c2f7.PubTator'

nohup.out from GnormPlusJava :

Starting GNormPlus Service at 172.17.0.2:18895 Loading Gene Dictionary : Processing Time:8.459sec Ready /693c63dd1b77aa3f29f02c2bb2ef000b0ae1f6846f1d8bb46497dfb2.PubTator - (PubTator format) : Processing Time:5.734sec java.io.FileNotFoundException: tmp/693c63dd1b77aa3f29f02c2bb2ef000b0ae1f6846f1d8bb46497dfb2.PubTator (No such file or directory) at java.io.FileOutputStream.open0(Native Method) at java.io.FileOutputStream.open(FileOutputStream.java:270) at java.io.FileOutputStream.(FileOutputStream.java:213) at java.io.FileOutputStream.(FileOutputStream.java:101) at GNormPluslib.BioCDoc.PubTator2BioC(BioCDoc.java:124) at kr.ac.korea.dmis.GNormPlus.tag(GNormPlus.java:316) at kr.ac.korea.dmis.GNPServer.run(GNPServer.java:42) at kr.ac.korea.dmis.GNPServer.(GNPServer.java:30) at kr.ac.korea.dmis.GNPServer.main(GNPServer.java:72)

nosiam avatar Jul 17 '20 11:07 nosiam

I solved these errors by reinstalling CRF in GNormPlusJava and tmVar2Java.

In my situation, this error,

FileNotFoundError: [Errno 2] No such file or directory: 'biobert_ner/tmp/token_test_Thread-{thread_id}.txt'

, was caused by tmVar2. It produced an empty file in tmVarJava/output so that the tokens file can't not be produced correctly.

And this error,

FileNotFoundError: [Errno 2] No such file or directory: '~/bern/GNormPlusJava/output/{text_hash_id}.PubTator'

, was caused by GNormPlus. Like the previous error, GNormPlus didn't generate correct output file. You might find some error message asking you to reinstall CRF in ~/bern/logs/nohup_gnormplus.out.

ting830812 avatar Nov 03 '20 09:11 ting830812