An upper-level course
for CS majors on formal languages theory and compilers.

Topics (subject to revision): regular expressions; finite automata; context-free grammars; predictive parsing; LR parsing; abstract syntax; type systems and type-checking; stack layout and activation records; intermediate representations; control-flow graphs; static-single assignment (SSA) form; dataflow/liveness analysis; register allocation; garbage collection/runtimes; the LLVM compiler infrastructure. Over the course of the semester, students will implement a full functioning compiler for a small imperative programming language, targeting LLVM. The course involves a significant amount of programming.

Topics (subject to revision): regular expressions; finite automata; context-free grammars; predictive parsing; LR parsing; abstract syntax; type systems and type-checking; stack layout and activation records; intermediate representations; control-flow graphs; static-single assignment (SSA) form; dataflow/liveness analysis; register allocation; garbage collection/runtimes; the LLVM compiler infrastructure. Over the course of the semester, students will implement a full functioning compiler for a small imperative programming language, targeting LLVM. The course involves a significant amount of programming.

**Lecture**: Tuesday, Thursday 1:30–2:50 p.m.,
Walter Hall 135

**Professor**: Gordon Stewart (gstewart@ohio.edu)

**Office Hours**: T 3-4:30pm (Stocker 355), or by appointment

**TA**: Nathan St. Amour (ns196414@ohio.edu)

**Lab Hours**: Mondays before assignments due, 3-4:30pm in Stocker 307 (tentative)

**Piazza**: Course Page, Signup

- The Grumpy Spec
- The CS4100 GitHub Repo
- The Grumpy Visual Debugger
- HackerRank Practice Problems, FP/Recursion

**Modern Compiler Implementation in ML.**Andrew W. Appel. Available for free online for Ohio University students through the university library.

Periodically I may assign additional supplementary (optional but recommended) readings from resources such as

- Real World OCaml (abbreviated RWO in the syllabus below)
- the OCaml language manual
- the OCaml Batteries Included documentation
- the LLVM reference manual
- Types and Programming Languages (TAPL) -- available through Alden Library

In addition to biweekly homework assignments, there will be a midterm exam (Week 7, approximately 15% of your grade) and a final (approximately 25%). The biweekly homeworks (programming assignments) are worth approximately 40%. We'll have weekly quizzes every Tuesday (with probability 1/3), along with bi-weekly offline Blackboard quizzes (total 10%). Participation and attendance at lecture are worth 5%. You get an additional 5% for free, just for signing up for the course.

Blackboard will be used only to report grades and to post lecture
notes. Up-to-date information on all other aspects of the course
(assignment due dates, etc.) will be posted either on this website
or on the Piazza page or both.

**Assignments Key:**

Introduction to compilers and functional programming in OCaml

Reading: Appel 1; RWO I.1.

Supplemental Reading: OCaml Manual: Core Language.

More functional programming: polymorphism, higher-order
functions, algebraic datatypes and pattern-matching

Supplemental Reading: OCaml Pervasives Library (reference)

A0 Due 1/23 at 11:59pm: A0: Intro. to OCaml.

Q0 Due 1/23 at 11:59pm

Regular expressions, regular languages

Reading: Appel 2 (up to and including 2.2)

A1 Due 1/30 at 11:59pm: A1: Functional
Programming in OCaml.

Q1 Due 1/30 at 11:59pm

Context-free languages, pushdown automata

Reading: Appel 3 (through Section 3.1)

A2 Due 2/15 at 11:59pm: A2: Regular Expressions Re-Examined.

Recursive descent parsing, predictive parsing, parser generators

Reading: Appel Sections 3.2-3.5

Q3 Due 2/20 at 11:59pm

Abstract syntax trees, type systems

Reading: Appel 4, TAPL 8 (OU Library eBook)

Q4 Due 3/1 at 11:59pm

Symbol tables, type-checking

Reading: Appel 5

A3 Due 3/6 at 11:59pm: A3: Lexing and Parsing with ocamllex and Menhir.

Control-flow graphs, dominators

Reading: Appel 7.1, Appel 18.1

Use-def, dataflow/liveness analysis,
Static Single Assignment (SSA) form,
interference graphs

Reading: Appel 10.1, Appel 19 (up to but not including 19.1)

A4 Due 3/29 at 11:59pm: A4: Type-checking.

Dataflow analysis contd., translation to SSA form

Reading:

Q5 Due 4/3 at 11:59pm

Stack layout and activation records;
Intro. to runtimes, garbage collection;
mark-and-sweep collection, copying collection, reference counting,
generational collection

Reading: Appel 13, through 13.4;
Appel 6.1

Q6 Due 4/10 at 11:59pm

A5 Due 4/15 at 11:59pm: A5: SSA.

Intro. to LLVM assembly and the LLVM compiler toolkit;
intro. to register allocation

Reading: Appel 11 through 11.3; AOSA: LLVM

Q7 Due 4/17 at 11:59pm

- Use pattern-matching to decompose and compute on structured data
- Use recursion to write functions that manipulate recursive types such as syntax trees
- Use higher-order functions such as map to manipulate data structures such as lists or trees
- Construct a finite state machine to recognize a given language

(b) An ability to analyze a problem, and identify and define the computing requirements appropriate to its solution. Students will be able to:

- Determine whether a given language is recognizable (e.g., by a RE, DFA, or CFG)
- Identify the recursive functions appropriate for translating programs into a particular intermediate representation, such as static single assignment form

(c) An ability to design, implement, and evaluate a computer-based system, process, component, or program to meet desired needs. Students will be able to:

- Design, implement in OCaml, and evaluate against a test suite the correctness of, a lexer and parser for a high-level language
- Design, implement in OCaml, and evaluate against a test suite the correctness of, a type-checker for a high-level language
- Design, implement in OCaml, and evaluate against a test suite the correctness of, a program transformation mapping expressions to static single assignment form
- Evaluate the purpose, and correctness of, a program transformation mapping code to static single assignment form

(j) An ability to apply mathematical foundations, algorithmic principles, and computer science theory in the modeling and design of computer-based systems in a way that demonstrates comprehension of the tradeoffs involved in design choices. Students will be able to:

- Apply computer science theory to determine whether a given grammar is parseable by recursive descent
- Evaluate the tradeoffs, in terms of asymptotic complexity, of distinct garbage collection algorithms
- Evaluate the tradeoffs in precision vs. computability of static analyses that underlie garbage collection (e.g., for liveness)
- For a given program, use mathematical foundations such as graph theory to evaluate the feasibility of a particular register-allocation strategy

(k) An ability to apply design and development principles in the construction of software systems of varying complexity. Students will be able to:

- Evaluate the tradeoffs, in terms of design complexity, of a modular vs. monolithic compiler implementation
- Design and implement a compiler embodying the modular approach

Instructor/GA | Noninstructor (e.g., Another Student) | |
---|---|---|

You | all collaboration allowed | high-level discussion
(of the problems, not your code!)
allowed but only after you've started the assignment;
must be documented in README as described below |

Unless otherwise noted, homeworks are due Tuesdays by 11:59 p.m. Late homework assignments will be penalized according to the following formula:

- Up to 24 hours late: no deduction, for a max 2 late homeworks per student across the entire semester
- Homeworks later than 24 hours, or from students who have already turned in 2 late homeworks, will receive 0 points.

You **may** discuss the homework with other students in
the class, but only after you've attempted the problems on your own
first. If you do discuss the homework problems with others, write the
names of the students you spoke with, along with a brief summary of
what you discussed, in a README comment at the top of each
submission. Example:

```
(*
README Gordon Stewart, Assn #1
```

I worked with X and Y. We swapped tips regarding the use of pattern-matching
in OCaml. *)

However, **under no circumstances** are you permitted
to share or directly copy code or other written homework material,
except with course instructors.
The code and proofs you turn in must
be your own. Remember: homework is there to give *you* practice in
the new ideas and techniques covered by the course; it does you no
good if you don't engage!

That said, if we find that you have cheated on an assignment in this course, you will immediately:

- Be referred to the Office of Community Standards (which may take disciplinary action against you, possibly expulsion); and
- Flunk the course (receive a final grade of F).

Students in EECS courses such as this one must adhere to the Russ College of Engineering and Technology Honor Code, and to the OU Student Code of Conduct. If you haven't read these policies, do so now.