Quiz 07: Policies and Information

Location, Date, and Time

Conflicts: There will be no conflict quiz as students are able to choose the time and date of their quiz.

Quiz Content

All quizzes are cumulative. Previous material can reappear on a later quiz.

11.1: Grammar of Data

Provided with: Data Transformation Cheatsheet

Interrogating Data:

  • Why do we want data to be in a tidy format when asking questions of it?
  • How can we phrase questions that promote effective wrangling of data?
  • Describe the five verbs related to data manipulation.


  • What is dplyr?
  • How is dplyr different than base R?
  • What are the pros and cons of using dplyr vs. base R in manipulating data?
  • Why is the pipe operator useful for dplyr’s functions?


  • What is the Split-Apply-Combine paradigm? What occurs during each stage of its stages?
  • How is Split-Apply-Combine related to the MapReduce algorithm used by Hadoop to process large data?
  • Why is Split-Apply-Combine an effective framework for statistical computing?

Non-Standard Evaluation (NSE):

  • What is considered standard evaluation?
  • How is NSE different from standard evaluation?
  • Why do we have environments?
  • What kinds of scoping rules does R have?
  • Why does these scoping rules sometimes hide problems in code?
  • How does the tidyverse take advantage of environments during NSE evaluation?

11.2: Joins


  • How is the idea of databases similar to the idea of R’s data.frame?
  • Why are databases useful for analyses?

Keys and Relationships

  • How does a primary key differ from a foreign key?
  • What kinds of relationships exist between two or more tables?


  • What are the different kinds of joins?
  • Why should we use join instead of naively combining data?
  • When does right_join() return results similar to left_join()?

12.1 SQL

Provided with: SQLite Cheatsheet

Connecting to a Database:

  • What is an RDBMS?
  • How does John Chambers’ third statement in his quote on understanding R apply to forming a connection with a database?
  • Why is it useful to outsource data manipulation to data base?

Structured Query Language (SQL):

  • What is SQL?
  • How does an declarative language differ from an imperative language?
  • Why are the names of the data types different between R and SQL but their functionality is largely the same?
  • How can we create a SQL table? How does this differ from filling a data.frame?
  • What is the primary difference between how a SQL statement is terminated compared to an R statement?
  • How can we embed SQL into an R Markdown document?
  • What does the * mean in a SELECT statement?
  • How is the SQL SELECT statement similar to statements in Base R and dplyr?
  • What is the difference between HAVING and WHERE conditions as it relates to filtering data?
  • Why do we say statements like AVG(), COUNT(), and SUM() act as aggregation?
  • What SQL keyword provides us with the ability to use the Split-Apply-Combine paradigm among groups?


  • How does DBI allow us to translate data from R to a Database?
  • Why is there a difference between remote and local versions of data?
  • Why might it be beneficial to break up a selection query?

SQL Joins:

  • What are the different joining paradigms possible?
  • How do the joining paradigms differ from each other?
  • What happens if we only join on one key that repeatedly exists in a database?
  • How are JOIN ... USING ( ... ) and INNER JOIN ... ON ... different?


  • What does dbplyr do?
  • How is dbplyr useful when connected with dplyr?
  • What are some ways to tell if dbplyr is being used or not?
  • Why might we not want to use dbplyr if we know SQL?

SQL Injection:

  • What is a SQL injection and why do they occur?
  • Why are SQL injections harmful?
  • How can we project against a SQL injection?

12.2: Unstructured Data

Provided with: String Manipulation Cheatsheet


  • What is unstructured data?
  • Where can unstructured data be found?

Text Representations:

  • What is the difference between a character and a string? How relevant is this to R’s representation of unstructured data?
  • What are escape characters? When should they be used?

String Operators:

  • What is the difference between taking the length(x) and nchar(x)? Why does this difference exist?
  • How do the following two strings differ “UPPER” vs. “upper”?
  • How can you concatenate strings together?
  • What controls the per-element concatentation vs. entire vector concatenation?
  • When might taking a substring of the data be appropriate?
  • What are the benefits of breaking apart a string?

Text Mining:

  • What are the benefits of tokenization?
  • Why is pre-processing text important?
  • Why can sentiment analysis be consider a subset of the bag-of-words approach?

Materials Needed

  • Preferably, a rested mind and non-broken hands that can type.


  • All answers must be reasonably simplified.
  • Decimals answers must contain two significant digits.
  • Grading will be done as follows:
    • A correct answer will receive all points.
    • An incorrect answer will receive proportionally appropriate partial credit.

If you have a technical issue while answering questions or need assistance with opening or starting the quiz, please alert the proctor.

Do not leave the CBTF without filing an issue with the proctor if something goes wrong.


Have a testing accommodation? Please see how the CBTF handles Letters of Accommodation.

The short version: Please bring a copy of the Letter of Accommodation to the CBTF Proctors prior to the test taking place.

Academic Integrity

In short, don’t cheat. Keep your eyes on your own quiz. Do not discuss the quiz with your friends after you have taken it. Any violation will be punished as harshly as possible.

Advice for Studying

The best way to study for a STAT 385 quiz is by writing and reading code. Try to take an idea in STAT 385 and apply it to your own work.

With this being said, there are three other resources that may assist your studies:

  • Topic Outline (Above)
  • Lecture Code
  • Homework

Again, the best way to study is to do programming in some fashion. Whether that be writing code or explaining how code works to someone else.

Consider using resources such as:

  1. RStudio Cloud Primers for interactive practice.
  2. Exercise problems listed in a given section of the readings.

Do not spend time memorizing lecture slides. You will not see any verbatim questions.

Do not try pulling an all-nighter. You can schedule your quiz anytime between a time window. To program efficiently, you need sleep despite the quote:

“Programmers are an organism that turns caffeine into code.”

Frequently Asked Questions

What kind of question types are on the quiz?

There are generally four types of problems:

  • True / False
  • Multiple Selection (e.g. select ALL correct answers from a list)
  • Fill in the blank
  • Writing Code

How many problems are on the quiz?

Only one question with 15012391 subquestions. In all seriousness, do not fixate on a number. There will be a reasonable amount of questions for the time period.

How long will it take to do the quiz?

Depending on your background, the quiz may take:

  • Prior R in-depth experience: 25 minutes
  • Some R experience: 35 minutes
  • No R experience: 50 minutes

Avoid fixating on time. Life will come and go more quickly than you realize. Focus more on the content.

When will the quiz be returned?

As all problems are automatically graded, we should be able to post the quiz results after the examination window closes.

Will the quiz be curved?


We got our grades back, now will the quiz be curved?

No. Curving is only done sparingly at the end of the semester. Individual assignments are not modified.