tsne-cuda icon indicating copy to clipboard operation
tsne-cuda copied to clipboard

tsnecuda fails with a large number of points using FAISS 1.7

Open DavidMChan opened this issue 3 years ago • 6 comments

It seems like tsnecuda is experiencing the same issues as in https://github.com/facebookresearch/faiss/issues/1793. Running the code with ./tsne -k 500000 (500000 2D points drawn from a pair of gaussians) gives:

Starting TSNE calculation with 500000 points.
Initializing cuda handles... done.
KNN Computation... Faiss assertion 'err__ == cudaSuccess' failed in void faiss::gpu::ivfInterleavedScanImpl_32_(faiss::gpu::Tensor<float, 2, true>&, faiss::gpu::Tensor<int, 2, true>&, thrust::device_vector<void*>&, thrust::device_vector<void*>&, faiss::gpu::IndicesOptions, thrust::device_vector<int>&, int, faiss::MetricType, bool, faiss::gpu::Tensor<float, 3, true>&, faiss::gpu::GpuScalarQuantizer*, faiss::gpu::Tensor<float, 2, true>&, faiss::gpu::Tensor<long int, 2, true>&, faiss::gpu::GpuResources*) at /home/davidchan/Repos/faiss/faiss/gpu/impl/scan/IVFInterleaved32.cu:13; details: CUDA error 9 invalid configuration argument
Aborted (core dumped)

Originally posted by @kernfel in https://github.com/CannyLab/tsne-cuda/issues/95#issuecomment-824528732

DavidMChan avatar Apr 22 '21 04:04 DavidMChan

@kernfel - What version of CUDA/GCC are you using? Also, are you installing FAISS with the conda installation, or the from-scratch FAISS install?

DavidMChan avatar Apr 22 '21 04:04 DavidMChan

Cuda toolkit 11.3 GCC -- I may have inadvertently used v10 here... seems my update-alternatives weren't up to date. FAISS -- building from source.

kernfel avatar Apr 22 '21 04:04 kernfel

I'm able to reproduce with 500,000 points with CUDA 11.2, gcc 9.3, building both from source. Downgrading to a CPU index does seem to fix the problem, which suggests that the issue is with FAISS gpu index and not with our downstream code.

For anyone at FAISS, the offending code is here:

const int32_t kNumCells = static_cast<int32_t>(
        std::sqrt(static_cast<float>(num_points)));
    const int32_t kNumCellsToProbe = 20;

    // Construct the CPU version of the index
    faiss::IndexFlatL2 quantizer(num_dims);
    faiss::IndexIVFFlat cpu_index(&quantizer, num_dims, kNumCells, faiss::METRIC_L2);
    cpu_index.nprobe = kNumCellsToProbe;

    if (num_near_neighbors < 1024)
    {
        int ngpus = faiss::gpu::getNumDevices();
        std::vector<faiss::gpu::GpuResourcesProvider *> res;
        std::vector<int> devs;
        for (int i = 0; i < ngpus; i++)
        {
            res.push_back(new faiss::gpu::StandardGpuResources);
            devs.push_back(i);
        }

        // Convert the CPU index to GPU index
        faiss::Index *search_index = faiss::gpu::index_cpu_to_gpu_multiple(res, devs, &cpu_index);

        search_index->train(num_points, points);
        search_index->add(num_points, points);
        search_index->search(num_points, points, num_near_neighbors, distances, indices);

        delete search_index;
        for (int i = 0; i < ngpus; i++)
        {
            delete res[i];
        }
    }
    else
    {
        // Construct the index table on the CPU (since the GPU
        // can only handle 1023 neighbors)
        cpu_index.train(num_points, points);
        cpu_index.add(num_points, points);
        // Perform the KNN query
        cpu_index.search(num_points, points, num_near_neighbors,
                         distances, indices);
    }

The CPU path (if forced, even with a neighbors < 1024) works, while the GPU path doesn't,

DavidMChan avatar Apr 22 '21 04:04 DavidMChan

Second update: It doesn't seem to be limited to the flat index. The IVFPQ index also seems to have the same error:

Starting TSNE calculation with 500000 points.
Initializing cuda handles... done.
KNN Computation... Faiss assertion 'err__ == cudaSuccess' failed in void faiss::gpu::runTransposeAny(faiss::gpu::Tensor<OtherT, OtherDim, true, int, faiss::gpu::traits::DefaultPtrTraits>&, int, int, faiss::gpu::Tensor<OtherT, OtherDim, true, int, faiss::gpu::traits::DefaultPtrTraits>&, cudaStream_t) [with T = float; int Dim = 3; cudaStream_t = CUstream_st*] at /home/davidchan/Repos/faiss/faiss/gpu/utils/Transpose.cuh:218; details: CUDA error 9 invalid configuration argument
Aborted (core dumped)

DavidMChan avatar Apr 22 '21 04:04 DavidMChan

Perhaps also related are: https://github.com/facebookresearch/faiss/issues/1835 https://github.com/facebookresearch/faiss/issues/1771

DavidMChan avatar Apr 22 '21 05:04 DavidMChan

Got my build issues under control and can confirm that FAISS v1.6.5 does not have this issue.

kernfel avatar Apr 22 '21 05:04 kernfel

Resolved in latest.

DavidMChan avatar Jul 21 '23 21:07 DavidMChan