Lecture 1 - Introduction to the Course

Overview of the Course

When running programs:

Pantoja will run it on the lab machines for reference, to confirm a speedup in your programs you submit. We may have to backup to a supercomputer that Pantoja has access in case the labs fail us.

We also have Google Collab, which she highly recommends. However, it's not required. It's a good way to share code and run it platform-independently. By default they give access to crappy GPUs, but for 10$ a month they give access to nice GPUs.

History of Parallel Computing

CUDA was the goto, for NVIDIA GPUs. It was very clunky, as it made programming extremely tedious. NVIDIA has been making GPU programming easier over the years. You still have access to the lower-level calls to put things on cache, but they've added a hardware-abstraction layer to help with certain calls.

As such, CUDA is now much easier to teach. But they also are not the monopoly. Intel has their own, and National Labs also has their own. But we'll stick with CUDA. Later on, we'll abstract programming on any GPU platform, so that you can take this class to those programming languages.

The Progression

We'll start spreading our programming over:

Syllabus

See the Syllabus section on Canvas via this link for more information, and grade breakdown, but the jist is:

Roadmap

Again, this is in the syllabus for the roadmap of sections. Essentially, there will be a lab due every week. We start this week with an introduction and review of what you already know from Systems Programming, along with Computer Architecture.

Also the labs are group projects. Know that the program sometimes doesn't really have to work. It has to work fast. We'll try making an implementation, time it, try to improve the time, see if it's faster, rinse and repeat.

By the way, there's theoretically an HPC (High Performance Computing) class, but we never get to it since this class never gets offered smh.

Reports

For each lab, make sure you make a report, with graphs to showing scalability via number of cores, or whatever you're changing for your speedup.

The report doesn't need to be an essay, but you should give a sentence or two just explaining the data, or what you found in your report. These are going to be like 1 to 2 pages.

Introduction to Multicore

The whole slide deck will be here:

![[Multicore_Architectures.pdf]]

The objective here will be to:

Even the cheapest CPUs on the market are multicore now. But what does it mean?

What it means to be multicore?

Inside of the same CPU, we get multiple cores on the inside (think multiple otter cpus working as one CPU).

But why? Due to Moore's Law we kept doubling our transistor count, which really meant the price per transistor went down, allowing more powerful machines for cheaper. But nowadays, the transistor count is about the same nowadays, since we don't really care about shoving as many transistor into one spot nowadays; now it's more about the architecture.

Recall from Lecture 12 - Power Consumption and Calculations or from EE 307 (see Winter 2024) that CMOS follows:

P=CVDD2f

for the power draw over a standard CMOS transistor. What we see on:![[Multicore_Architectures.pdf#page=3]]

is that increasing the clock increased our power draw pretty consistently. The Intel Pentium was infamously so hot that people would be forced to suspend their pentium laptops off of the table.

But notice the power drop for the highest clock rate architectures. From this slide:
![[Multicore_Architectures.pdf#page=4]]

that as time progressed, they thought that they'd have to just do a better cooling system, which was just not feasible. Instead, they thought to use multiple cores.

![[Multicore_Architectures.pdf#page=5]]

Notice above that the area gets doubled, but the voltage and frequency can get reduced, while still getting a performance boost.

Clearly, the advantage for the architecture is huge! However, you now need to write your algorithms to have to split between multiple cores/CPUs. Hence, the change in the computer hardware forced a change in the programming in the software.

Now, most programs are single core. But because of multicore CPUs, if you use traditional programming, you only use the one, lower power, and thus lower performance, core.

Seriously! Like we were so used to just giving more money for performance, that once multicore was more widely used, the OS software and other various software wasn't able to take advantage of it.