communitynotes TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

Describe the bug concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/opt/conda/lib/python3.11/concurrent/futures/process.py", line 261, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/community-note/communitynotes/sourcecode/scoring/run_scoring.py", line 294, in _run_scorer_parallelizable
scoringResults = scorer.prescore(scoringArgs, preserveRatings=not runParallel)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/community-note/communitynotes/sourcecode/scoring/scorer.py", line 301, in prescore
noteScores, userScores, metaScores = self._prescore_notes_and_users(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/community-note/communitynotes/sourcecode/scoring/mf_base_scorer.py", line 554, in _prescore_notes_and_users
) = self._run_stable_matrix_factorization(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/community-note/communitynotes/sourcecode/scoring/mf_base_scorer.py", line 449, in _run_stable_matrix_factorization
return self._run_regular_matrix_factorization(ratingsForTraining)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/community-note/communitynotes/sourcecode/scoring/mf_base_scorer.py", line 424, in _run_regular_matrix_factorization
return self._mfRanker.run_mf(ratingsForTraining)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/community-note/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py", line 560, in run_mf
self._lossModule = NormalizedLoss(
^^^^^^^^^^^^^^^
File "/root/community-note/communitynotes/sourcecode/scoring/matrix_factorization/normalized_loss.py", line 108, in init
assert all(ratings[labelCol].values == targets.numpy())
^^^^^^^^^^^^^^^
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
"""

To Reproduce I run the code in shell:

python3 main.py
--enrollment /root/community-note/enrollment/2024-12-29_20-02.tsv
--notes /root/community-note/note/2024-12-29_20-02.tsv
--ratings /root/community-note/rating/
--status /root/community-note/status/2024-12-29_20-02.tsv
--outdir /root/community-note/notescore
--parallel

Expected behavior I believe that this is due to normalized_loss.py, line 108 assert all(ratings[labelCol].values == targets.numpy())

I am not sure if I should change it to assert all(ratings[labelCol].values == targets.cpu().numpy())

Environment

Same venv as in requirement
NVIDIA H100 80GB HBM3 X2
CUDA 12.2
python 3.11.9
Intel(R) Xeon(R) Platinum 8462Y+
516GB RAM

Jan 16 '25 19:01 Jacobsonradical

We ran into the same issue. @tuler you successfully ran the code a few days ago. Did you encounter the same issue?

Jan 25 '25 02:01 avalanchesiqi

@avalanchesiqi
I think there are two ways to solve this.

install CPU pytorch, then pytroch automatically compute everything on CPU, no need to transfer tensor
change the line to assert all(ratings[labelCol].values == targets.cpu().numpy())

Jan 25 '25 19:01 Jacobsonradical

We ran into the same issue. @tuler you successfully ran the code a few days ago. Did you encounter the same issue?

No, I ran on CPU only.

Jan 25 '25 20:01 tuler

I think this could be a solution:

changing this:

assert all(ratings[labelCol].values == targets.numpy())

for this:

assert all(ratings[labelCol].values == targets.detach().cpu().numpy())

Jun 12 '25 15:06 AntonioCoppe

communitynotes communitynotes copied to clipboard

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

communitynotes
communitynotes copied to clipboard