BitCrack icon indicating copy to clipboard operation
BitCrack copied to clipboard

Increase Performance with Vectorized Memory Access

Open ByLamacq opened this issue 4 years ago • 19 comments

Hello,

I changed global memory access from scalar to vector.

Plateform : Ubuntu 16.04, GTX 1050ti, Cuda 10.1 (Up : Original) GTX1050ti_MasterVsUint4_web

Plateform : Ubuntu 18.04, RTX 2080, Cuda 10.2 (Up : Original) RTX2080_MasterVsUint4

Best regards, ByLamacq

ByLamacq avatar Mar 19 '20 20:03 ByLamacq

Have you finds anything on 2080? How much have you checked?

kpot87 avatar Apr 07 '20 01:04 kpot87

Avez-vous trouvé quelque chose en 2080? Combien avez-vous vérifié?

I'm not looking so I can't find anything. It's just for programming challenge...

ByLamacq avatar Apr 20 '20 20:04 ByLamacq

could you help out in add these features, currently bitcrack running features like id stride is 100 count+stride+count+stride = 1+100+1+100 = total 202 looking update with new switch --count as define by user ( --count 200) and stride 100 ( user count is checking keys) user-count + stride + user-count+ stride + user- count = 300+100+300+100+300 = total 1100

addons if --keyspace is 1:3000, new switch --loop --count 2000 --stride 100 user-count + stride + user-count+ stride + user- count = 2000+100+2000(its reach at end but still countin in loop from 1(startkey))+100+2000 continue loop --count will be keys need to be check and stride hope this feature will make bitcrack more effective and attractive Thankx

hamnaz avatar Jun 21 '20 19:06 hamnaz

Hello good morning, I want to know if I put several video cards on the same computer to give you an example 4 video cards, these 4 video cards when running the program would have greater power and speed or not? I await your comments.

marcelosantoto avatar Feb 12 '21 02:02 marcelosantoto

Yes - you will have greater power and speed.

marssystems avatar Feb 12 '21 04:02 marssystems

Sí, tendrás mayor potencia y velocidad.

First of all, thank you very much for your answer and other questions and the video cards can be any model, for example gtx 1080ti 11GB, some 2 video cards and adding rx 580 8GB about 3 video cards and adding rtx 3060TI 8GB, I would have no problems or have to be all the same models and nvidia or AMD ?, I await your answer.

marcelosantoto avatar Feb 12 '21 14:02 marcelosantoto

Yes - they can be any Nvidia cards. I don't know about AMD cards. I use Windows 10 and Nvidia cards with no problems.

marssystems avatar Feb 12 '21 14:02 marssystems

Sí, pueden ser cualquier tarjeta Nvidia. No sé acerca de las tarjetas AMD. Utilizo tarjetas Windows 10 y Nvidia sin problemas.

Again thank you very much for responding and I will see to incorporate more video cards then to achieve greater power and speed, I ask you, what video cards do you use? Have you tried the Nvidia GTX, RTX or QUADDRO? Which ones do you recommend using?

marcelosantoto avatar Feb 12 '21 14:02 marcelosantoto

I use 12 Nvidia P106-100 mining cards and 2 Nvidia Tesla K80's.

marssystems avatar Feb 12 '21 15:02 marssystems

I use 12 Nvidia P106-100 mining cards and 2 Nvidia Tesla K80's.

were you lucky to use so much power and speed with Bitcarck?

marcelosantoto avatar Feb 13 '21 14:02 marcelosantoto

Not yet - I just started.

marssystems avatar Feb 13 '21 14:02 marssystems

Utilizo 12 tarjetas de minería Nvidia P106-100 y 2 Nvidia Tesla K80.

¿Tuviste suerte de usar tanta potencia y velocidad con Bitcarck?

are they on 2 separate PCs or 1? as if it were a mining rig?

marcelosantoto avatar Feb 13 '21 15:02 marcelosantoto

They are on one PC - an old converted mining rig.

marssystems avatar Feb 13 '21 16:02 marssystems

@ByLamacq Is this a patch, which could be also applied to OpenCL? Or is this a CUDA-specific optimization?

Uzlopak avatar May 22 '21 10:05 Uzlopak

It's not really a patch but yes it's can be apply to Opencl. Amd gpu have also specific microcode for vector data load. So i think this change in cl code can increase performance.

BitCrackEvo avatar May 23 '21 08:05 BitCrackEvo

@BitCrackEvo My C and C++ skills are limited. Are you skilled to implement this?

Uzlopak avatar May 23 '21 10:05 Uzlopak

@Uzlopak I will try this later but there is many change to do.

Actualy, opencl read an array of structure : typedef struct { uint v[8]; }uint256_t;

It's not very good but OpenCl is a high language of programmation so it's depend on the implementation by the compilator...

But, you can also try to do that yourself... It's a good training to upgrade your skills. I will try after my own project about BitCrack. Sorry.

BitCrackEvo avatar May 24 '21 12:05 BitCrackEvo

Hi @BitCrackEvo

I started to dig deeper. Very interesting. Can you Help me with this question on stack overflow?

https://stackoverflow.com/questions/67667314/transform-native-c-matrix-multiplication-to-opencl-simd-matrix-multiplication?r=SearchResults

Uzlopak avatar May 24 '21 12:05 Uzlopak

This boosted my Jetson Nano about 20% faster.

sigkill avatar Feb 03 '22 13:02 sigkill