Course Selection Guide from Berkeley EECS Faculty Advising Session
Below is a thread on course guidance for UC Berkeley engineering students excerpted from CS70 (Discrete Math and Probability) Piazza, specifically for the freshmen / sophomores who are taking on Machine Learning career. This semester, I was in professor Sahai’s faculty advising session (Apr. 16th, 2020) and thought that this piece of info would be valuable for any Berkeley student. Thus making a post to record it. All the provided information below is from his response during the advising session.
Dear students,
I had my EECS advising session, and I think based on the questions I got, there are probably a few questions that many of you have that I should just answer here for everyone.
1. How should you decide what courses you should take?
For anyone in one of our majors (EECS or LSCS), there is a very simple approach to building out schedules. First, complete the six lower-division foundational courses (the computational trilogy 61A, 61B, 61C and the modeling/math trilogy 16A, 16B, 70). Next, start simultaneously building a wide base with an element of depth in the upper-division.
Look at HKN’s excellent map:
This is a very good resource since it clearly calls out the actual lines of dependency among courses. Since 70 is a lower-division course, it is useful to view things from 70’s perspective.
Notice that there are two key gateway courses in the lower-division: 16B and 61C. 16B is the gateway to the entire AI/ML related zone, mathematical courses generally, as well as the “interacting with the physical world” side which includes everything from robotics on down. 61C is the gateway to the computation side which ranges from software systems to digital hardware.
Here, 70 essentially plays a supporting role for 16B — — adding a vital dose of mathematical maturity that is essential for the next level of courses. I’ll say more about that later in this post.
Back to building out your base. In general, your first upper-division courses should be from the level that is marked “core” in the HKN map. Typically, two cores from different areas. After that, you should continue with the typical pattern of one more core plus either another core, or a course that builds on the core at what HKN calls the “extension” level. At some point, you might want to get down into the more “specialty” courses.
The above strategy will build you a strong base with an area of concentration where you’ve gone deeper.
2. Where do you go from CS70 (Discrete Math)?
There are three natural successor courses to 70 itself. The most direct successor to 70 is 126. This helps you actually get to what many of us would consider the minimum level of probability understanding that you actually need. Especially after this semester where we’ve had to make some content cuts in response to the Covid-19 crisis, you really should take 126 if this last module of 70 has been good for you. (i.e.~you think you’ll be pulling a B+ or better level of understanding on the probability material.) If your understanding of the probability is weaker, then you might want to seriously consider taking Stat 140 instead since that course doesn’t have a 70 prereq and will hit some of the same content in a gentler fashion to let you catch up.
The second pair of natural successors are 127 and 170. Both of these don’t really build that much on the material in 70 per se, but do have an absolute hard dependence on the mathematical maturity that 70 builds. 127 is the most natural successor to 16B and the topic (continuous optimization) occupies an absolutely central role in the subject and is the midst of an explosive expansion of relevance. A huge amount of cutting-edge algorithmic work in a vast variety of areas is increasingly based on the 16B/127-style of thinking as compared to the more traditional 61B/170-style. In the AI/ML area, the transition is essentially complete but the same sort of thing is happening for graph algorithms, and a bunch of other areas. (For this reason, 127 (along with 126) should probably be considered a straight-up prerequisite for courses like 270 at this point.) 170 can be considered the natural successor to 61B and the diverse material there is a part of the standard vocabulary of the subject — — most people should probably take it before they graduate. Despite 127-style thinking’s modern dominance, it is good to have exposure and appreciation for diverse algorithmic perspectives that 170 brings — — the 170 perspective even helps you better leverage the 127 material. (And in particular, there is an ongoing resurgence of SAT-solving based combinatorial techniques that can be quite powerful in many modern settings — — from 151-style chip design to 164-style programming languages to 149-style embedded systems and 161-style security.)
3. Math is great and all, but I want to make sure I can get a job. What courses should I take?
This isn’t necessarily clear from the HKN map, but there are plenty of incredible courses that you can take that make you quite employable. Arguably, the single most “employable” course is 151. It is a lot of work, but teaches extremely valuable skills and it is perhaps the only “one hit kill” type of OP course that we have in that it alone is enough to qualify you for a full-time position at a top employer. It’s also extremely good context even for folks outside of architecture/digital-design, and can combine very usefully with other practical areas. (For example, Prof. Sophia Shao has recently taught a follow-on special topics course on thinking about hardware for ML, which is a red-hot area in industry.)
The core systems courses like 149 and 162 are also very useful — — arguably essential — — for anyone who wants to have a long term career in software engineering. Similarly, taking 105 and then 140 is very useful for anyone who might want to have a foot in analog hardware. (140 is another “get a job at Apple” type of course.)
The amazing thing about Berkeley’s flexible program is that someone who comes in as a freshman and takes 2 EECS techs per term basically can take 16 EECS courses before they graduate in 4 years: 6 lower-divs + 10 upper-divs. Someone can take all of the above: 105, 140, 149, 151, 162 and still take 5 more courses to build out a deep base in a mathematical areas (say: 120, 126, 127, 170, 189) or any other areas. With that kind of broad and deep training, the person is essentially unstoppable career-wise.
4. Can I take 189 (Intro to ML) after 70 (Discrete Math)?
No. Almost certainly not.
The reality is that to properly understand machine learning, you have to build on a base of three things: linear-algebra, optimization, and probability. EECS16AB almost set the stage for what you need in that the linear algebra base is fully built and the optimization thinking is coming into view, and you’ve seen many of the key machine learning ideas. But 16AB don’t use probability and there is a definite big gap between where 70 leaves off vis-a-vis probability and where 189 starts. That gap requires EECS 126 ideally, although Stat 140 can also do the job. On the optimization thinking side, there is a similar gap between the end of 16B and where 189 starts. That gap requires EECS 127 ideally, although some courses in IEOR could also do that job.
There is also the issue of mathematical maturity. 189 expects a mathematical/modeling maturity that students develop in courses like 126 and 127 — — and so there’s a gap there as well between where 70 leaves off and where 189 starts.
To use traditional terminology (used at other schools, less so at Berkeley). 70 is a sophomore course. 126 and 127 are junior courses. And 189 is a senior course.
The issue isn’t usually programming — — the programming in 16AB is essentially enough for 189. The issue is on the mathematical side.
So, what happens to students who try to take 189 but don’t have the right maturity and background? The best-case scenario is serious struggle and pain as they try to simultaneously learn the probability, learn the optimization thinking, learn the machine learning, while struggling with the mathematical maturity involved. Instead of just being able to learn the machine learning while simultaneously strengthening their background in probability and optimization. Unfortunately, even the best-case scenario rarely turns out well. Almost always, in the face of this level of mismatch, students end up dropping concepts and just trying to survive. This can be demoralizing as well as resulting in poor understanding overall, as well as a bad grade. Far worse is the scenario where students think that they are understanding, but actually are quite confused in pretty profound ways. The worst case is when students end up in this category but because of test-taking skills and the limitations of exam making, they manage to get lots of partial credit on exams and get a decent grade — — without actually understanding the core ML concepts at beyond the cocktail-party conversation level. This kind of “false knowledge” is a kind of ticking time bomb that is far worse than simply knowing that you don’t understand. Nobody should want to be a “Dunning-Kruger” exemplar in a room, even if there are other people who are also in the same boat. (Company doesn’t make this any better.)
Between probability (126 or 140) and optimization (127 beyond 16B), which is more important for machine learning? The reality is that the operational reality of machine learning fundamentally leans on optimization and linear algebra to do the work. A solid understanding of 16B can take you pretty far, and when combined with 127, you can at least follow what is going on and think tactically — — the “how” level and some of the “why”. Meanwhile, the role of probability is more at the level of strategy as compared to tactics. Probability intuition at the level built in 126 helps you understand more of the “why” parts. So, a student with a solid understanding of 16B and 126 can walk into 189 and take it together with 127 and do alright, at the cost of having to work harder than someone who already has 127. The student with only 16B and 127 beforehand will upon walking into 189 experience conceptual challenges without the probability intuition that 126 or 140 would help provide. Things can feel like just one thing after another.
5. What Math upper divs should you take?
The Math minor list of upper divs is a great guideline. If I had to pick two, I’d say 104 (real analysis) and 113 (abstract algebra). If you liked Mod Math, etc. then 113 will be a real treat for you. Both of these courses will complement what your EECS-dept training will be providing you. Folks going to graduate school will need a 104-level of understanding of analysis for pretty much all mathy areas of EECS.
Follow-up questions from students:
Q: I heard lots of upperclassman friends leaving EECS16b to senior year but took 126/127 long before that. Does doing this make our foundation shaky if one hopes to be a ML researcher in the future?
Yes. It is a very suboptimal thing to do. And it also doesn’t make sense — — because it just causes more work for them.
Why does it make your foundations shaky? Perhaps the most important reason has to do with lab. Because in the Berkeley curriculum, the main ML project experience is hosted in 16B — — that’s where the lab lets you actually experience the full ML pipeline: from the data creation apparatus to the the data collection to the fitting of a model to making decisions based on that model to having that decision-making interact with the world in real time. Courses like 189 don’t actually have projects at nearly that level of completeness. They’re toys of toys by comparison. That 16B lab/project experience is designed to give perspective so that students can both see the power of models as well as be appropriately skeptical of them — — since no model is ever exact. This allows students to better understand the point of other things.
Beyond the lab, the issue has to do with fundamental core concepts like the SVD, PCA (and approximations by subspaces), classification by hyperplanes, etc… All of the foundations are built in 16B. 127 can review these things, but it expects previous exposure so that you can understand them deeper. The control and dynamics side of 16B is also vital to setup the intuitive foundation that things like gradient descent then build upon (there’s a classic 16B HW problem that does this for least squares, etc.). Walking into upper-division courses without the right foundation forces students to have to learn from scratch from the later course’s intended review/recap of earlier material. Good reviews/recaps take a slightly different angle to provide students with a “stereo” perspective. A student without 16B is walking with one eye shut.
Q: Can I take EECS126 (Probability and Random Processes) concurrently with CS189 (Intro to ML)
You really should take this BEFORE 189. Why before?
Look at the HKN chart. Everyone has 16B and 70. So, the question is, conditioned on having 16B and 70, what is the single most important course that a Berkeley student should have in the ML area? The answer is probably 127. What are the two most important courses? 126/140 and 127. What are the three most important courses? Hard to say exactly, but I’d vote for 126/140, 127, and 189.
Now think about your entire 4 year schedule. If you’re going to take 189, you probably should be taking 126/140 and 127 as well. So the question just becomes about order. What are the legitimate considerations? Understanding and workload. The maximum understanding is by taking 126/140 and 127 before 189. The minimum workload is by taking 126/140 and 127 before 189. Both considerations vote for the same order. For students whose probability understanding would benefit by taking 140 rather than 126, the requirement to take them before is *absolute* since you presumably want to actually understand the material in 189.
Let me continue this answer below:
Q: If I impatiently hope to take CS189 one semester later, and have to take one of the 126/127 concurrently with 189, which one should I take before taking 189?
Why are 126 and 127 more core for undergrads in the ML area (given that everyone knows 16B and 70)? Why do we say this when you hear something different from your friends at other schools who are clamoring to take their school’s Intro-ML class? Unless you understand the answer to this question, you cannot understand what you should do here at Berkeley and why impatience is not a good strategy. The core reason is that the ML area is very rapidly evolving, and in particular, is in the midst of a partial paradigm shift. But you just get a short window in your life to take classes as an undergrad to set the foundation for lifelong learning over decades. In that context, you want to understand the foundations better by taking 126 and 127 — — especially since both courses engage directly with ML ideas while doing so and have jupyter notebooks, etc. Since the ideas/skills there will continue to be relevant for decades while it is quite possible that many things taught in any iteration of 189 that you take will just become obsolete. If you had learned that obsolete material in a way that was grounded in 126/127, you will not be caught out when obsolescence happens — — your understanding will adapt. But if you learned the obsolete material without such grounding, you will just get holes in your understanding. As the years pass, you will be diminished. And then, despite years of experience, you’ll face the risk of being replaced by a fresh graduate at exactly the time in your life when this will hurt you the most. Most software engineers are never given the choice of proactively defending themselves against this — — you have that choice because the Berkeley curriculum is different than what they have.
Let me also stop pretending that students live in an ideal world, and address some of the tensions and stresses that you face. For example, getting started with research. We understand that students can feel torn when what’s good for their understanding, education, and long-term career seems to conflict with what’s good for their short-term career. There are at least four dimensions to an Intro-ML class: vocabulary, basic concepts, actual understanding, and computational practice. Here’s the reality: after 16AB, you already have a rooted understanding of more than 50% of all the basic concepts taught elsewhere — — even if you don’t yet know what you know. You have basic computational practice already from all the 16AB stuff, and the 16B lab is a better introduction to ML than essentially anything else out there. At this point, with 70 under your belt, you can pick up vocabulary and a few other basic concepts very easily, along with more computational practice — — especially for folks with 61B-level programming skill. There’s a ton of online resources. Just go through, say, the Stanford one quarter class that (Berkeley Alum) Andrew Ng put up on Coursera. With the 16AB+70 background you have, you will go through it like a hot knife through butter.
Here’s the kicker: how much extra work would it be to go through the entire Coursera course? *Less work* than the extra work you’d have to do if you tried to take Berkeley’s 189 (which is a much more intense beast) without the appropriate background. Then, when you take 189 after having the right background, you’ll be even more golden because you can appreciate and absorb what 189 is really trying to teach you: the underlying nature of machine learning instead of just the surface. Most people at most schools don’t even get a shot at a course like that.
With something like the Coursera videos and some computational practice under your belt, the actual understanding you have from 16AB is enough for you to start being an ML code monkey — — which is how most everyone actually starts out when they start doing research as an undergrad. Don’t screw up a precious Berkeley semester for a goal that is achievable over a month in the summer in your spare time.
You don’t have to give up actually understanding things.
That said, if you decide to take one of 126/127 concurrently with 189 and the other before, the right choice is 126 before assuming you have a solid understanding of 16B.