pyserini icon indicating copy to clipboard operation
pyserini copied to clipboard

Test failure: TestLtrMsmarcoDocument on macOS 12.1

Open lintool opened this issue 3 years ago • 16 comments

Test failure on my iMac Pro, macOS Monterrey 12.1... any ideas?

% python -m unittest integrations.sparse.test_ltr_msmarco_document.TestLtrMsmarcoDocument
Attempting to initialize pre-built index msmarco-doc-per-passage-ltr.
/Users/jimmylin/.cache/pyserini/indexes/index-msmarco-doc-per-passage-ltr-20211031-33e4151.bd60e89041b4ebbabc4bf0cfac608a87 already exists, skipping download.
Initializing msmarco-doc-per-passage-ltr...
Using pre-defined topic order for msmarco-doc-dev
Running msmarco-doc-dev topics, saving to ltr_test/run.msmarco-pass-doc.bm25.txt...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5193/5193 [48:00<00:00,  1.80it/s]
--2022-01-12 18:34:07--
Resolving (
Connecting to (||:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 105024292 (100M) [application/gzip]
Saving to: ‘ltr_test/model-ltr-msmarco-passage-mrr-v1.tar.gz’

model-ltr-msmarco-passage-mrr-v1.tar.gz                                   100%[=====================================================================================================================================================================================>] 100.16M  54.5MB/s    in 1.8s    

2022-01-12 18:34:09 (54.5 MB/s) - ‘ltr_test/model-ltr-msmarco-passage-mrr-v1.tar.gz’ saved [105024292/105024292]

x msmarco-passage-ltr-mrr-v1/
x msmarco-passage-ltr-mrr-v1/metadata.json
x msmarco-passage-ltr-mrr-v1/output.json.gz
x msmarco-passage-ltr-mrr-v1/model.pkl
--2022-01-12 18:34:10--
Resolving (
Connecting to (||:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 252776626 (241M) [application/gzip]
Saving to: ‘ltr_test/model-ltr-ibm.tar.gz’

model-ltr-ibm.tar.gz                                                      100%[=====================================================================================================================================================================================>] 241.07M  52.6MB/s    in 4.5s    

2022-01-12 18:34:14 (53.3 MB/s) - ‘ltr_test/model-ltr-ibm.tar.gz’ saved [252776626/252776626]

x ibm_model/
x ibm_model/url_unlemm/
x ibm_model/url_unlemm/source.vcb
x ibm_model/url_unlemm/target.vcb
x ibm_model/url_unlemm/output.t1.5.bin
x ibm_model/body/
x ibm_model/body/source.vcb
x ibm_model/body/target.vcb
x ibm_model/body/output.t1.5.bin
x ibm_model/text_bert_tok/
x ibm_model/text_bert_tok/source.vcb
x ibm_model/text_bert_tok/target.vcb
x ibm_model/text_bert_tok/output.t1.5.bin
x ibm_model/title_unlemm/
x ibm_model/title_unlemm/source.vcb
x ibm_model/title_unlemm/target.vcb
x ibm_model/title_unlemm/output.t1.5.bin
Namespace(input='tools/topics-and-qrels/', min_query_token_qty=0, output='ltr_test/')
{'herein', 'i', 'outside', 'hither', 'every', 'among', 'dost', 'never', 'hereabouts', 'also', 'on', 'us', 'thy', 'yourself', "doesn't", 'excepted', 'other', 'would', 'formerly', 'much', 'me', 'beyond', 'like', 'himself', 'of', 'till', 'before', 'furthest', 'is', 'between', 'lest', 'doing', 'after', 'excluding', 'beforehand', 'latterly', 'a', 'must', 'latter', 'thereabouts', 'kind', 'need', 'nonetheless', 'thereof', 'very', 'anyhow', 'canst', 'one', 'elsewhere', 'am', 'far', 'against', 'then', 'his', 'per', 'unable', 'several', 'inside', 'farthest', 'year', 'sang', 'thee', 'enough', 'whenever', 'whereof', 'doth', 'inasmuch', 'whichsoever', 'sprang', 'those', 'thereon', 'or', 'thence', 'to', 'itself', 'stave', 'really', 'so', "'s", 'further', 'whereby', 'most', 'somehow', 'few', 'he', 'in', 'this', 'ok', 'as', 'nevertheless', 'therein', 'round', 'were', 'thereto', 'sent', 'within', 'thus', 'during', 'howsoever', 'meantime', 'has', 'used', 'same', 'upwards', 'sprung', 'including', 'namely', 'underneath', 'ye', 'inward', 'worse', 'until', 'spoken', 'cannot', 'might', 'just', 'said', 'along', 'whole', 'more', 'av', 'upward', 'spoke', 'became', 'else', 'included', 'mostly', 'via', 'behind', 'hence', 'him', 'themselves', 'had', 'wherefrom', 'almost', 'becoming', 'less', 'ever', 'no', 'something', 'worst', 'whereunto', 'always', 'do', 'exclusive', 'thrice', 'slung', 'first', 'little', 'upon', 'forth', 'may', 'own', 'whereto', 'whichever', 'does', 'halves', 'my', 'saw', 'everyone', 'slunk', 'everything', 'amongst', 'anybody', 'into', 'beside', 'hardly', 'neither', 'herself', 'wherever', 'km', 'another', 'thereby', 'excepting', 'across', 'mr', 'well', 'go', 'cu', 'where', 'seemed', 'hereto', 'hath', 'spake', 'sometime', 'indeed', 'again', 'for', 'seldom', 'even', 'myself', 'nothing', 'et', 'etc', 'it', 'somewhat', 'hers', 'toward', 'by', 'seen', 'she', 'therefore', 'hereupon', 'whensoever', 'everybody', 'whereas', 'ie', 'why', "'ll", 'whose', 'perhaps', 'if', 'whew', 'furthermore', 'any', 'thyself', 'out', 'such', 'nobody', 'whoever', 'seems', 'been', 'since', 'yours', 'together', 'none', 'thou', 'once', 'many', 'will', 'indoors', 'whoa', 'how', 'and', 'yippee', 'provide', "'ves", 'hast', 'whereafter', 'others', 'because', 'whence', 'whosoever', 'not', 'ugh', 'yourselves', 'hindmost', 'contrariwise', 'whatever', "'m", 'wilt', 'be', 'can', 'let', 'using', 'your', 'thereupon', 'according', 'somewhere', 'yet', 'its', 'our', 'their', 'anything', 'these', 'whether', 'quite', 'nowadays', 'wherein', 'moreover', 'at', 'down', 'are', 'shown', 'ff', 'henceforth', 'thru', 'week', 'the', 'around', 'selves', 'insomuch', 'now', 'towards', 'too', 'alone', 'whereon', 'although', 'someone', 'all', 'some', 'slew', 'whomever', 'than', 'kg', 'whereat', 'but', 'thenceforth', 'see', 'an', 'apart', 'vs', 'dual', 'front', 'anyone', 'have', 'up', 'afterwards', 'that', 'supposing', 'could', 'slept', 'exception', 'sake', 'thereafter', 'unlike', 'use', "n't", 'her', 'about', 'ltd', 'include', 'rather', 'hitherto', 'staves', 'nope', 'noone', 'off', 'thereabout', 'ms', 'whomsoever', 'except', 'double', 'under', 'without', 'sometimes', "'re", 'already', 'while', 'still', 'ours', 'should', 'nor', 'plenty', 'them', 'above', 'besides', 'however', 'being', 'meanwhile', 'albeit', 'you', 'cos', 'with', 'which', 'become', 'from', 'wheresoever', 'wherewith', 'mrs', 'wherefore', 'shalt', 'day', 'cf', 'get', 'next', 'over', 'whatsoever', 'seem', 'we', 'forward', 'either', 'here', 'was', 'there', 'whereinto', 'spat', 'both', 'onto', 'often', 'though', 'whereabouts', 'whilst', 'seeing', 'save', 'what', 'below', 'inwards', 'whom', 'anywhere', 'whither', 'nowhere', 'ourselves', 'who', 'unless', 'exclude', 'ought', 'last', 'smote', 'only', 'anyway', 'whereupon', 'somebody', 'sideways', "'d", 'maybe', 'when', 'instead', 'notwithstanding', 'hereby', 'seeming', 'they', 'want', 'everywhere', 'hereafter', 'farther', 'wow', 'throughout', 'each', 'otherwise', 'becomes', 'certain', 'inc', 'choose', 'through'}
Disabled Spacy components:  ['ner', 'parser']
5193it [00:31, 167.26it/s]
Processed 5193 queries
analyzed contents
text_unlemm text_unlemm
text_bert_tok text_bert_tok
IBM model Load takes 14.84 seconds
IBM model Load takes 31.29 seconds
IBM model Load takes 312.11 seconds
IBM model Load takes 57.21 seconds
load dev
scripts/ltr_msmarco/ DeprecationWarning: `np.object` is a deprecated alias for the builtin `object`. To silence this warning, use `object` by itself. Doing this will not modify any behavior and is safe. 
Deprecated in NumPy 1.20; for more details and guidance:
  assert dev['qid'].dtype == np.object
scripts/ltr_msmarco/ DeprecationWarning: `np.object` is a deprecated alias for the builtin `object`. To silence this warning, use `object` by itself. Doing this will not modify any behavior and is safe. 
Deprecated in NumPy 1.20; for more details and guidance:
  assert dev['pid'].dtype == np.object
scripts/ltr_msmarco/ DeprecationWarning: `np.object` is a deprecated alias for the builtin `object`. To silence this warning, use `object` by itself. Doing this will not modify any behavior and is safe. 
Deprecated in NumPy 1.20; for more details and guidance:
  assert dev['qid'].dtype == np.object
scripts/ltr_msmarco/ DeprecationWarning: `np.object` is a deprecated alias for the builtin `object`. To silence this warning, use `object` by itself. Doing this will not modify any behavior and is safe. 
Deprecated in NumPy 1.20; for more details and guidance:
  assert dev['pid'].dtype == np.object
(5191286, 2)
rank    999.66994
rel     999.66994
dtype: float64
                     rank  rel
qid     pid                   
1000000 D1005950#0    454    0
        D100612#18    834    0
        D1007366#0    425    0
        D1007366#1    614    0
        D1011301#0     60    0
        D1011301#1     78    0
        D1027656#1    661    0
        D1027656#2    566    0
        D1063056#20   281    0
        D1067089#0    264    0
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 5191286 entries, ('1000000', 'D1005950#0') to ('999942', 'D997465#3')
Data columns (total 2 columns):
 #   Column  Dtype
---  ------  -----
 0   rank    int32
 1   rel     int32
dtypes: int32(2)
memory usage: 228.4+ MB
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5193/5193 [00:29<00:00, 175.25it/s]
load queries
[thread 50435 also had an error]
# A fatal error has been detected by the Java Runtime Environment:
#  SIGSEGV (0xb) at pc=0x0000000105767ffe, pid=67546, tid=82179
# JRE version: Java(TM) SE Runtime Environment (11.0.4+10) (build 11.0.4+10-LTS)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (11.0.4+10-LTS, mixed mode, tiered, compressed oops, g1 gc, bsd-amd64)
# Problematic frame:
# [thread 49667 also had an error]
[thread 81923 also had an error]
[thread 80899 also had an error]
[thread 51203 also had an error][thread 81667 also had an error]

[thread 82691 also had an error]
[thread 50691 also had an error]
[thread 51459 also had an error]
[thread 49411 also had an error]
[thread 81155 also had an error][thread 49923 also had an error][thread 50179 also had an error]
[thread 51715 also had an error][thread 81411 also had an error]

[thread 50947 also had an error]
[thread 82435 also had an error][thread 80131 also had an error]

[thread 54531 also had an error]
[thread 78339 also had an error]
[thread 78083 also had an error]
[thread 77571 also had an error]
[thread 77315 also had an error]
[thread 76803 also had an error]
[thread 55555 also had an error]
[thread 76291 also had an error]
[thread 75779 also had an error]
[thread 55811 also had an error]
[thread 56323 also had an error]
[thread 75267 also had an error]
[thread 75011 also had an error]
[thread 57091 also had an error]
C  [libomp.dylib+0x60ffe][thread 57347 also had an error]
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
[thread 73987 also had an error]
# An error report file with more information is saved as:
# /Users/jimmylin/workspace/pyserini/hs_err_pid67546.log
# If you would like to submit a bug report, please visit:
/Users/jimmylin/opt/anaconda3/envs/pyserini-dev/lib/python3.8/multiprocessing/ UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
Traceback (most recent call last):
  File "scripts/ltr_msmarco/", line 14, in <module>
    with open(args.input) as fin:
FileNotFoundError: [Errno 2] No such file or directory: 'ltr_test/run.ltr.msmarco-pass-doc.test.trec'
Traceback (most recent call last):
  File "tools/scripts/msmarco/", line 235, in <module>
  File "tools/scripts/msmarco/", line 222, in main
    metrics = compute_metrics_from_files(path_to_reference, path_to_candidate, exclude_qids)
  File "tools/scripts/msmarco/", line 184, in compute_metrics_from_files
    qids_to_ranked_candidate_documents = load_candidate(path_to_candidate)
  File "tools/scripts/msmarco/", line 98, in load_candidate
    with autoopen(path_to_candidate,'r') as f:
  File "tools/scripts/msmarco/", line 28, in autoopen
    return open(filename, mode)
FileNotFoundError: [Errno 2] No such file or directory: 'ltr_test/run.ltr.msmarco-pass-doc.test.tsv'
ERROR: test_reranking (integrations.sparse.test_ltr_msmarco_document.TestLtrMsmarcoDocument)
Traceback (most recent call last):
  File "/Users/jimmylin/workspace/pyserini/integrations/sparse/", line 56, in test_reranking
    result = subprocess.check_output(f'python tools/scripts/msmarco/ --judgments tools/topics-and-qrels/ --run ltr_test/{outp_tsv}', shell=True).decode(sys.stdout.encoding)
  File "/Users/jimmylin/opt/anaconda3/envs/pyserini-dev/lib/python3.8/", line 415, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/Users/jimmylin/opt/anaconda3/envs/pyserini-dev/lib/python3.8/", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'python tools/scripts/msmarco/ --judgments tools/topics-and-qrels/ --run ltr_test/run.ltr.msmarco-pass-doc.test.tsv' returned non-zero exit status 1.

Ran 1 test in 3437.703s

FAILED (errors=1)

lintool avatar Jan 12 '22 23:01 lintool

That's weird. Just rerun this test on orca and it passed

5191286it [00:07, 688446.36it/s]
100%|███████████████████████████████████████████████████████████████████| 5193/5193 [00:01<00:00, 4988.37it/s]
Ran 1 test in 4216.341s


stephaniewhoo avatar Jan 13 '22 01:01 stephaniewhoo

Seems to be a macOS problem... I'm trying to debug.

@stephaniewhoo do you have access to a macOS machine you can try also?

lintool avatar Jan 13 '22 01:01 lintool

Seems to be related to

lintool avatar Jan 13 '22 02:01 lintool

This seems to be a known issue and doesn't appear to have been resolved yet:

lintool avatar Jan 13 '22 12:01 lintool

I also get it worked on orca. Will try on my macos machine too.

yuki617 avatar Jan 13 '22 15:01 yuki617

Yes, confirmed that the test case passes on orca.

lintool avatar Jan 13 '22 15:01 lintool

score_tie occurs 208854 times in 5188 queries
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5193/5193 [00:16<00:00, 312.44it/s]
score_tie occurs 208854 times in 5188 queries
5191286it [00:07, 669016.70it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5193/5193 [00:01<00:00, 4888.41it/s]
Ran 1 test in 10265.931s


Finally the test is complete on my laptop (macOS) and it passes as well. My system is also MacOS Monterey 12.1 (21C52)

stephaniewhoo avatar Jan 13 '22 20:01 stephaniewhoo

Same as Stephanie, the test also passed on my macos machine, my system is macOS Big Sur version 11.5.2

yuki617 avatar Jan 14 '22 03:01 yuki617

Trying out the example here:

% python
Python 3.8.12 (default, Oct 12 2021, 06:23:56) 
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from lightgbm import LGBMClassifier
>>> import numpy as np
>>> from concurrent.futures import ThreadPoolExecutor
>>> x = np.random.random((200, 4))
>>> y = x.sum(axis=1) >= 2
>>> def myfunc(a=7):
...     test = LGBMClassifier().fit(x, y)
...     print(test.predict(x))
>>> with ThreadPoolExecutor(20) as tpe:
...     print(list(, range(20))))
zsh: segmentation fault  python

Indeed, I get a seg fault - this is on macOS 12.1.

Additional details:

% pip list | grep lightgbm
lightgbm           3.3.2

% brew info libomp
libomp: stable 13.0.0 (bottled)
LLVM's OpenMP runtime library
/usr/local/Cellar/libomp/13.0.0 (9 files, 1.6MB) *
  Poured from bottle on 2022-01-12 at 21:15:03
License: MIT
==> Dependencies
Build: cmake ✘
==> Analytics
install: 52,037 (30 days), 228,792 (90 days), 1,152,680 (365 days)
install-on-request: 7,733 (30 days), 31,763 (90 days), 140,940 (365 days)
build-error: 10 (30 days)

lintool avatar Jan 15 '22 03:01 lintool

I have tested the above script on my macOS with macOS version 11.5.2, libomp version 13.0.0 and lightgbm 3.2.2 . No segmentation fault is found, everything works just fine. The reason for the test failure was narrowed down to the version of macOS.

yuki617 avatar Jan 15 '22 03:01 yuki617

Waiting for upstream fixes. No further action for now.

lintool avatar Jan 15 '22 13:01 lintool

I just updated to libomp 14.0.0 via brew. Issue still persists.

lintool avatar Apr 02 '22 11:04 lintool

Update, still having this issue:

% python -m unittest integrations.sparse.test_lucenesearcher_check_ltr_msmarco_document.TestLtrMsmarcoDocument
Attempting to initialize pre-built index msmarco-doc-per-passage-ltr.
/Users/jimmylin/.cache/pyserini/indexes/index-msmarco-doc-per-passage-ltr-20211031-33e4151.bd60e89041b4ebbabc4bf0cfac608a87 already exists, skipping download.
Initializing msmarco-doc-per-passage-ltr...
Attempting to initialize pre-built index msmarco-doc-per-passage-ltr.
/Users/jimmylin/.cache/pyserini/indexes/index-msmarco-doc-per-passage-ltr-20211031-33e4151.bd60e89041b4ebbabc4bf0cfac608a87 already exists, skipping download.
Initializing msmarco-doc-per-passage-ltr...
analyzed contents
text_unlemm text_unlemm
text_bert_tok text_bert_tok
IBM model Load takes 14.59 seconds
IBM model Load takes 32.02 seconds
IBM model Load takes 315.10 seconds
IBM model Load takes 58.74 seconds
[thread 53763 also had an error]
[thread 78855 also had an error]
[thread 54019 also had an error]
[thread 76035 also had an error]
[thread 51207 also had an error]
[thread 75523 also had an error]
[thread 77827 also had an error]
[thread 51463 also had an error][thread 77059 also had an error]
[thread 52995 also had an error]# A fatal error has been detected by the Java Runtime Environment:
[thread 52739 also had an error][thread 77571 also had an error]

[thread 51715 also had an error]
[thread 77315 also had an error]

[thread 50951 also had an error][thread 78087 also had an error][thread 78599 also had an error]
[thread 50695 also had an error]
[thread 51971 also had an error][thread 50439 also had an error]

[thread 52483 also had an error]

[thread 53251 also had an error]
[thread 76291 also had an error][thread 76803 also had an error][thread 53507 also had an error]

[thread 76547 also had an error][thread 52227 also had an error]

#  SIGSEGV (0xb)[thread 75267 also had an error] at pc=0x0000000104abbffe
, pid=24960, tid=78343
# JRE version: Java(TM) SE Runtime Environment (11.0.4+10) (build 11.0.4+10-LTS)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (11.0.4+10-LTS, mixed mode, tiered, compressed oops, g1 gc, bsd-amd64)
# Problematic frame:
# [thread 71171 also had an error]
[thread 70659 also had an error]
[thread 70147 also had an error]
[thread 57347 also had an error]
[thread 69891 also had an error]
[thread 69635 also had an error]
C  [libomp.dylib+0x60ffe]  __kmp_suspend_initialize_thread+0x1e
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
# An error report file with more information is saved as:
# /Users/jimmylin/workspace/pyserini/hs_err_pid24960.log
# If you would like to submit a bug report, please visit:
/Users/jimmylin/opt/anaconda3/envs/pyserini-dev/lib/python3.8/multiprocessing/ UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
Traceback (most recent call last):
  File "tools/scripts/msmarco/", line 235, in <module>
  File "tools/scripts/msmarco/", line 222, in main
    metrics = compute_metrics_from_files(path_to_reference, path_to_candidate, exclude_qids)
  File "tools/scripts/msmarco/", line 184, in compute_metrics_from_files
    qids_to_ranked_candidate_documents = load_candidate(path_to_candidate)
  File "tools/scripts/msmarco/", line 98, in load_candidate
    with autoopen(path_to_candidate,'r') as f:
  File "tools/scripts/msmarco/", line 28, in autoopen
    return open(filename, mode)
FileNotFoundError: [Errno 2] No such file or directory: 'ltr_test/run.ltr.msmarco-pass-doc.test.trec'
ERROR: test_reranking (integrations.sparse.test_lucenesearcher_check_ltr_msmarco_document.TestLtrMsmarcoDocument)
Traceback (most recent call last):
  File "/Users/jimmylin/workspace/pyserini/integrations/sparse/", line 50, in test_reranking
    result = subprocess.check_output(f'python tools/scripts/msmarco/ --judgments tools/topics-and-qrels/ --run ltr_test/{outp}', shell=True).decode(sys.stdout.encoding)
  File "/Users/jimmylin/opt/anaconda3/envs/pyserini-dev/lib/python3.8/", line 415, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/Users/jimmylin/opt/anaconda3/envs/pyserini-dev/lib/python3.8/", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'python tools/scripts/msmarco/ --judgments tools/topics-and-qrels/ --run ltr_test/run.ltr.msmarco-pass-doc.test.trec' returned non-zero exit status 1.

Ran 1 test in 4061.513s

FAILED (errors=1)


% brew info libomp 
libomp: stable 14.0.0 (bottled)
LLVM's OpenMP runtime library

lintool avatar May 14 '22 13:05 lintool

Even with:


Per - didn't help.

lintool avatar May 14 '22 15:05 lintool

Trying this again:

% brew info libomp  
==> libomp: stable 14.0.6 (bottled)
LLVM's OpenMP runtime library

Still getting same error.

lintool avatar Sep 22 '22 13:09 lintool

Interestingly, on the M1 chip, lightgbm does work with the following install command:

conda install -c conda-forge lightgbm

lintool avatar Oct 09 '22 22:10 lintool