Lecture 12 - More on CUDA Memory and Specialization
See
![[2.3Atomics.pdf]]
The strategy to overcome issues with atomic operations is as follows:
- Make some
<<<N, M>>>
withnumber of blocks and threads per block. - Put
__shared__
on something liketemp
such that each block gets their own shared memory between local amounts of threads - Use
__syncthreads()
to make sure that all threads in a block are synchronized.
An example on the slide 8 code:
If you look at the dot product code:
![[2.3Atomics.pdf#page=9]]
We should look at:
![[2.2CudaMemoryModel.pdf]]