Fill in the blank: Structured query language (SQL) enables data analysts to _____ information from a database. Select all that apply.
- retrieve
- visualize
- request
- update
Home » Google Data Analytics Professional Certificate Answers
Google Data Analytics Professional Certificate Answers
Whether you’re just getting started or want to take the next step in the high-growth field of data analytics, professional certificates from Google can help you gain in-demand skills. You’ll learn about R programming, SQL, Python, Tableau and more.
Data analysts prepare, process, and analyze data to help inform business decisions. They create visualizations to share their findings with stakeholders and provide recommendations driven by data.
This certification is part of Google Career Certificates .
Complete a Google Career Certificate to get exclusive access to CareerCircle, which offers free 1-on-1 coaching, interview and career support, and a job board to connect directly with employers, including over 150 companies in the Google Career Certificates Employer Consortium.
All answers to pass this certification are only in our .PDF file, you can buy and download here:
Questions:
What method involves asking multiple questions in order to get to the root cause of a problem?
Shuffle Q/A
A set of instructions used to perform a specified calculation is known as what?
Which of the following is an example of why a data analyst may generate a query?
Shuffle Q/A
You are working with a database table named employee that contains data about employees. You want to review all the columns in the table.
You write the SQL query below. Add a FROM clause that will retrieve the data from the employee table.
SELECT
*
FROM employee
What employee has the job title of Sales Manager?
A data analyst creates the following visualization to clearly demonstrate how much more populous Charlotte is than the next-largest North Carolina city, Raleigh. What type of chart do they use?
Fill in the blank: A data analyst has to demonstrate how the population in a city has increased over time. In particular, they want to be able to see when the population has exceeded certain thresholds. The chart that would work best for this is a/an _____ chart.
You write the SQL query below. Add a FROM clause that will retrieve the data from the genre table.
What is the name of the genre with ID number 3?
You write the SQL query below. Add a WHERE clause that will return only data about the customer with ID number 35.
After you run your query, use the slider to view all the data presented.
What is the billing country for the customer with ID number 35?
Shuffle Q/A
You write the SQL query below. Add a FROM clause that will retrieve the data from the employee table.
What employee has the job title of Sales Manager?
You write the SQL query below. Add a WHERE clause that will return only data about the customer with ID number 40.
After you run your query, use the slider to view all the data presented.
What is the billing city for the customer with ID number 40?
You write the SQL query below. Add a FROM clause that will retrieve the data from the playlist table.
What is the playlist with ID number 3?
You write the SQL query below. Add a WHERE clause that will return only data about the customer with ID number 28.
After you run your query, use the slider to view all the data presented.
What is the billing city for the customer with ID number 28?
You write the SQL query below. Add a WHERE clause that will return only data about the customer with ID number 54.
After you run your query, use the slider to view all the data presented.
What is the billing address for the customer with ID number 54?
You write the SQL query below. Add a FROM clause that will retrieve the data from the employee table.
What is the job title of Andrew Adams?
A college IT department needs to reduce the number of computers on campus for student use. How could a data analyst help identify a solution to this problem?
In data analytics, what is the term for an obstacle to be solved?
Shuffle Q/A
Scenario 1, question 1-5
You’ve just started a new job as a data analyst for a midsized pharmacy chain with 38 stores in the American Southwest. Your supervisor shares a new data analysis project with you.
She explains that the pharmacy is considering discontinuing a bubble bath product called Splashtastic. Your supervisor wants you to analyze sales data and determine what percentage of each store’s total daily sales come from that product. Then, you’ll present your findings to leadership.
You know that it's important to follow each step of the data analysis process: ask, prepare, process, analyze, share, and act. So, you begin by defining the problem and making sure you fully understand stakeholder expectations.
One of the questions you ask is where to find the dataset you’ll be working with. Your supervisor explains that the company database has all the information you need.
Next, you continue to the prepare step. You access the database and write a query to retrieve data about Splashtastic. You notice that there are only 38 rows of data, representing the company’s 38 stores. In addition, your dataset contains five columns: Store Number, Average Daily Customers, Average Daily Splashtastic Sales (Units), Average Daily Splashtastic Sales (Dollars), and Average Total Daily Sales (All Products). You decide to use a spreadsheet to work with the data because you know that spreadsheets work well for processing and analyzing a small dataset, like the one you’re using.
Fill in the blank: To get the data from the database into a spreadsheet, you would first _____ the data as a .CSV file, then import it into a spreadsheet.
Scenario 1 continued
You’ve downloaded the data from your company database and imported it into a spreadsheet. IMPORTANT: To answer questions using this dataset for the scenario, click the link below and select the “Use Template” button before answering the questions.
Link to template: Course Challenge - Scenario 1
OR
If you don’t have a Google account, you can download the template directly from the attachment below.
Course Challenge Dataset - Scenario 1 - Scenario 1_ Pharmacy Data - Part 1
Now, it’s time to process the data. As you know, this step involves finding and eliminating errors and inaccuracies that can get in the way of your results. While cleaning the data, you notice that information about Splashtastic is missing for Store Number 15 in Row 16. Which of the following would be an appropriate course of action?
Scenario 1 continued
Once you’ve found the missing information, you analyze your dataset.
During analysis, you create a new column F. At the top of the column, you add: Average Percentage of Total Sales - Splashtastic. What is this column label called?
Scenario 1 continued
Next, you determine the average total daily sales over the past 12 months at all stores. The entire range of cells that contain these sales are E2:E39. The correct syntax is =AVERAGE(E2:E39).
Scenario 1 continued
Fill in the blank: You’ve reached the share phase of the data analysis process. One of the things that you can do in this phase is to prepare a _____ about Splashtastic’s sales and practice your presentation.
Scenario 2, questions 6-10
You’ve been working for the nonprofit National Dental Society (NDS) as a junior data analyst for about two months. The mission of the NDS is to help its members advance the oral health of their patients. NDS members include dentists, hygienists, and dental office support staff.
The NDS is passionate about patient health. Part of this involves automatically scheduling follow-up appointments after crown replacement, emergency dental surgery, and extraction procedures. NDS believes the follow-up is an important step to ensure patient recovery and minimize infection.
Unfortunately, many patients don’t show up for these appointments, so the NDS wants to create a campaign to help its members learn how to encourage their patients to take follow-up appointments seriously. If successful, this will help the NDS achieve its mission of advancing the oral health of all patients.
Your supervisor has just sent you an email saying that you’re doing very well on the team, and he wants to give you some additional responsibility. He describes the issue of many missed follow-up appointments. You are tasked with analyzing data about this problem and presenting your findings using data visualizations.
An NDS member with three dental offices in Colorado offers to share its data on missed appointments. So, your supervisor uses a database query to access the dataset from the dental group. The query instructs the database to retrieve all patient information from the member’s three dental offices, located in zip code 81137.
The table is dental_data_table, and the column name is zip_code. You write the following query, but get an error. What statement will correct the problem?
SELECT * FROM dental_data_table WHERE zip code = 81137
Scenario 2 continued
The dataset your supervisor retrieved and imported into a spreadsheet includes a list of patients, their demographic information, dental procedure types, and whether they attended their follow-up appointment. To use the dataset for this scenario, click the link below and select “Use Template.”
Link to template: Course Challenge - Scenario 2
OR
If you don’t have a Google account, you can download the template directly from the attachment below.
Course Challenge Dataset - Scenario 2
The patient demographic information includes data such as age, gender, and home address. When examining the geographic data, you notice that all the patients live in the same zip code.
Fill in the blank: The fact that the dataset includes people who all live in the same zip code might get in the way of ______.
Scenario 2 continued
As you’re reviewing the dataset, you notice that there are a disproportionate number of senior citizens. So, you investigate further and find out that this zip code represents a rural community in Colorado with about 800 residents. In addition, there’s a large assisted-living facility in the area. Nearly 300 of the residents in the 81137 zip code live in the facility.
You recognize that’s a sizable number, so you want to find out if age has an effect on a patient’s likelihood to attend a follow-up dental appointment. You analyze the data, and your analysis reveals that older people tend to miss follow-ups more than younger people.
So, you do some research online and discover that people over the age 60 are 50% more likely to miss dentist appointments. Sometimes this is because they’re on a fixed income. Also, many senior citizens lack transportation to get to and from appointments.
With this new knowledge, you write an email to your supervisor expressing your concerns about the dataset. He agrees with your concerns, but he’s also impressed with what you’ve learned and thinks your findings could be very important to the project. He asks you to change the business task. Now, the NDS campaign will be about educating dental offices on the challenges faced by senior citizens and finding ways to help them access quality dental care.
Fill in the blank: Changing the business task involves defining a new _____.
Scenario 2 continued
You continue with your analysis. In the end, your findings support what you discovered during your online research: As people get older, they’re less likely to attend follow-up dental visits.
But you’re not done yet. You know that data should be combined with human insights in order to lead to true data-driven decision-making. So, your next step is to share this information with people who are familiar with the problem professionally. They’ll help verify the results of your data analysis.
Fill in the blank: The people who are familiar with a problem and help verify the results of data analysis are _____.
Scenario 2 continued
The subject-matter experts are impressed by your analysis. The team agrees to move to the next step: data visualization. You know it’s important that stakeholders at NDS can quickly and easily understand that older people are less likely to attend important follow-up dental appointments than younger people. This will help them create an effective campaign for members.
It’s time to create your presentation to stakeholders. It will include a data visualization that demonstrates the lifetime trend of people being less likely to attend follow-up appointments as they get older.
Why would a line chart be the most effective in representing this?
Scenario 1, question 1-5
You’ve just started a new job as a data analyst. You’re working for a midsized pharmacy chain with 38 stores in the American Southwest. Your supervisor shares a new data analysis project with you.She explains that the pharmacy is considering discontinuing a bubble bath product called Splashtastic. Your supervisor wants you to analyze sales data and determine what percentage of each store’s total daily sales come from that product. Then, you’ll present your findings to leadership.You know that it's important to follow each step of the data analysis process: ask, prepare, process, analyze, share, and act. So, you begin by defining the problem and making sure you fully understand stakeholder expectations.One of the questions you ask is where to find the dataset you’ll be working with. Your supervisor explains that the company database has all the information you need. Next, you continue to the prepare step. You access the database and write a query to retrieve data about Splashtastic. You notice that there are only 38 rows of data, representing the company’s 38 stores. In addition, your dataset contains five columns: Store Number, Average Daily Customers, Average Daily Splashtastic Sales (Units), Average Daily Splashtastic Sales (Dollars), and Average Total Daily Sales (All Products).
You know that spreadsheets work well for processing and analyzing a small dataset, like the one you’re using. To get the data from the database into a spreadsheet, what should you do?
Scenario 1 continued
You’ve downloaded the data from your company database and imported it into a spreadsheet. IMPORTANT: To answer questions using this dataset for the scenario, click the link below and select the “Use Template” button before answering the questions.
Link to template: Course Challenge - Scenario 1
OR
If you don’t have a Google account, you can download the template directly from the attachment below.
Course Challenge Dataset - Scenario 1 - Scenario 1_ Pharmacy Data - Part 1
Now, it’s time to process the data. As you know, this step involves finding and eliminating errors and inaccuracies that can get in the way of your results. While cleaning the data, you notice that information about Splashtastic is missing for Store Number 15 in Row 16. Which of the following would be an appropriate response?
Scenario 1 continued
Once you’ve found the missing information, you analyze your dataset. During analysis, you create a new column F. At the top of the column, you add the attribute Average Percentage of Total Sales - Splashtastic.
Fill in the blank: An attribute is a _______ or quality of data used to label a column.
Scenario 1 continued
Next, you determine the average total daily sales over the past 12 months at all stores. The entire range of cells that contain these sales are E2:E39. Identify the correct way to write your function.
Scenario 1 continued
You’ve reached the share phase of the data analysis process. It involves which of the following? Select all that apply.
Scenario 2, questions 6-10
You’ve been working for the nonprofit National Dental Society (NDS) as a junior data analyst for about two months. The mission of the NDS is to help its members advance the oral health of their patients. NDS members include dentists, hygienists, and dental office support staff.
The NDS is passionate about patient health. Part of this involves automatically scheduling follow-up appointments after crown replacement, emergency dental surgery, and extraction procedures. NDS believes the follow-up is an important step to ensure patient recovery and minimize infection.
Unfortunately, many patients don’t show up for these appointments, so the NDS wants to create a campaign to help its members learn how to encourage their patients to take follow-up appointments seriously. If successful, this will help the NDS achieve its mission of advancing the oral health of all patients.
Your supervisor has just sent you an email saying that you’re doing very well on the team, and he wants to give you some additional responsibility. He describes the issue of many missed follow-up appointments. You are tasked with analyzing data about this problem and presenting your findings using data visualizations.
An NDS member with three dental offices in Colorado offers to share its data on missed appointments. So, your supervisor uses a database query to access the dataset from the dental group. The query instructs the database to retrieve all patient information from the member’s three dental offices, located in zip code 81137.
The table is dental_data_table, and the column name is zip_code. You have written the following query, but received an error when it ran.
SELECT * FROM dental_data_table WHERE dental_data_table = 81137
Given the objective of the query, where is the mistake in this query?
Scenario 2 continued
The dataset your supervisor retrieved and imported into a spreadsheet includes a list of patients, their demographic information, dental procedure types, and whether they attended their follow-up appointment. To use the dataset for this scenario, click the link below and select “Use Template.”
Link to template: Course Challenge - Scenario 2
OR
If you don’t have a Google account, you can download the template directly from the attachment below.
Course Challenge Dataset - Scenario 2
The patient demographic information includes data such as age, gender, and home address. When examining the geographic data, you notice that all the patients live in the same zip code.
Fill in the blank: The fact that the dataset includes people who all live in the same zip code might get in the way of ______.
Scenario 2 continued
As you’re reviewing the dataset, you notice that there are a disproportionate number of senior citizens. So, you investigate further and find out that this zip code represents a rural community in Colorado with about 800 residents. In addition, there’s a large assisted-living facility in the area. Nearly 300 of the residents in the 81137 zip code live in the facility.
You recognize that’s a sizable number, so you want to find out if age has an effect on a patient’s likelihood to attend a follow-up dental appointment. You analyze the data, and your analysis reveals that older people tend to miss follow-ups more than younger people.
So, you do some research online and discover that people over the age 60 are 50% more likely to miss dentist appointments. Sometimes this is because they’re on a fixed income. Also, many senior citizens lack transportation to get to and from appointments.
With this new knowledge, you write an email to your supervisor expressing your concerns about the dataset. He agrees with your concerns, but he’s also impressed with what you’ve learned and thinks your findings could be very important to the project. He asks you to change the business task. Now, the NDS campaign will be about educating dental offices on the challenges faced by senior citizens and finding ways to help them access quality dental care.
The business task has changed. What is the nature of that change?
Scenario 2 continued
You continue with your analysis. In the end, your findings support what you discovered during your online research: As people get older, they’re less likely to attend follow-up dental visits.
But you’re not done yet. You know that data should be combined with human insights in order to lead to true data-driven decision-making. So, your next step is to share this information with people who are familiar with the problem professionally. They’ll help verify the results of your data analysis.
Fill in the blank: The people who are familiar with a problem and help verify the results of data analysis are _____.
Scenario 2 continued
The subject-matter experts are impressed by your analysis. The team agrees to move to the next step: data visualization. You know it’s important that stakeholders at NDS can quickly and easily understand that older people are less likely to attend important follow-up dental appointments than younger people. This will help them create an effective campaign for members.
It’s time to create your presentation to stakeholders. It will include a data visualization that demonstrates the lifetime trend of people being less likely to attend follow-up appointments as they get older.
Which type of chart will be most effective?
Scenario 1 continued
You’ve downloaded the data from your company database and imported it into a spreadsheet. IMPORTANT: To answer questions using this dataset for the scenario, click the link below and select the “Use Template” button before answering the questions.
Link to template: Course Challenge - Scenario 1
OR
If you don’t have a Google account, you can download the template directly from the attachment below.
Course Challenge Dataset - Scenario 1 - Scenario 1_ Pharmacy Data - Part 1
Now, it’s time to process the data. As you know, this step involves finding and eliminating errors and inaccuracies that can get in the way of your results. While cleaning the data, you notice there’s missing data in one of the rows. What might you do to fix this problem? Select all that apply.
Scenario 1 continued
Next, you determine the average total daily sales over the past 12 months at all stores. The entire range of cells that contain these sales are E2:E39. To do this, you use a function. You input =AVE(E2:E39), but this returns an error. What is the correct command?
Scenario 2, questions 6-10
You’ve been working for the nonprofit National Dental Society (NDS) as a junior data analyst for about two months. The mission of the NDS is to help its members advance the oral health of their patients. NDS members include dentists, hygienists, and dental office support staff.
The NDS is passionate about patient health. Part of this involves automatically scheduling follow-up appointments after crown replacement, emergency dental surgery, and extraction procedures. NDS believes the follow-up is an important step to ensure patient recovery and minimize infection.
Unfortunately, many patients don’t show up for these appointments, so the NDS wants to create a campaign to help its members learn how to encourage their patients to take follow-up appointments seriously. If successful, this will help the NDS achieve its mission of advancing the oral health of all patients.
Your supervisor has just sent you an email saying that you’re doing very well on the team, and he wants to give you some additional responsibility. He describes the issue of many missed follow-up appointments. You are tasked with analyzing data about this problem and presenting your findings using data visualizations.
An NDS member with three dental offices in Colorado offers to share its data on missed appointments. So, your supervisor uses a database query to access the dataset from the dental group. The query instructs the database to retrieve all patient information from the member’s three dental offices, located in zip code 81137.
The table is dental_data_table, and the column name is zip_code. You write the following query.
SELECT * FROM dental_data_table WHERE zip code = 81137
This query is incorrect. How could it be fixed?
Scenario 2 continued
The dataset your supervisor retrieved and imported into a spreadsheet includes a list of patients, their demographic information, dental procedure types, and whether they attended their follow-up appointment. To use the dataset for this scenario, click the link below and select “Use Template.”
Link to template: Course Challenge - Scenario 2
OR
If you don’t have a Google account, you can download the template directly from the attachment below.
Course Challenge Dataset - Scenario 2
The patient demographic information includes data such as age, gender, and home address. You review the demographic data, paying particular attention to geography. What geographic aspect of the data may negatively impact fairness?
Scenario 2 continued
You continue with your analysis. In the end, your findings support what you discovered during your online research: As people get older, they’re less likely to attend follow-up dental visits.
But you’re not done yet. You know that data should be combined with human insights in order to lead to true data-driven decision-making. So, your next step is to share this information with people who are familiar with the problem professionally. They’ll help verify the results of your data analysis.
Fill in the blank: Subject matter experts are people who are familiar with a problem. They can help by identifying inconsistencies in the analysis, _____, and validating the choices being made.
Scenario 2 continued
The subject-matter experts are impressed by your analysis. The team agrees to move to the next step: data visualization. You know it’s important that stakeholders at NDS can quickly and easily understand that older people are less likely to attend important follow-up dental appointments than younger people. This will help them create an effective campaign for members.
It’s time to create your presentation to stakeholders. It will include a data visualization that demonstrates the lifetime trend of people being less likely to attend follow-up appointments as they get older.
Fill in the blank: The type of chart that would be most effective in visualizing this is a _____.
In structured thinking, why would a data analyst organize the available information?
A local internet service provider is expecting an increase in the number of people streaming online entertainment. Their data analyst uses data to estimate the required bandwidth necessary to service its customers. This is an example of which problem type?
Fill in the blank: The question, “How could we improve our website to simplify the returns process for our online customers?” is _____-oriented.
Shuffle Q/A
An analyst is working with data from two school programs. They discover that the data is measured differently across programs and this may impact how they can work with the data. What does this example describe?
A retail store runs a special sale with the goal of increasing sales over the holiday season. They use the increase in sales over the same month last year as a starting point. What type of goal is this an example of?
A data analyst assesses how well their company’s marketing campaign is performing. They apply a formula that compares the cost of the campaign and its net profit. What does this formula measure?
Shuffle Q/A
What calculations can you carry out within a spreadsheet? Select all that apply.
What are some of the ways that data analysts can gather data? Select all that apply.
You sum the entries in cells F3 through F200 in your spreadsheet. What is the correct function for this?
What are some of the causes of bias in data analytics? Select all that apply.
Shuffle Q/A
Fill in the blank: Your data analytics team is working on a project for the marketing department. The person most likely to be the _____ stakeholder is the vice president of marketing.
To communicate clearly with stakeholders and team members, there are four key questions data analysts ask themselves. One of the questions is: What does my audience already know? Identify the remaining three questions. Select all that apply.
You accept a new project from a high level stakeholder. After beginning the project, you find that you aren’t sure what you are supposed to do. How do you handle this?
A data analyst collects a large amount of data for their project to ensure that the data represents a diverse set of perspectives. What element of data collection does this describe?
When leading a meeting, it is important to respect your team members’ time. What are some ways of doing this? Select all that apply.
What are some of the “don’ts” when attending a meeting?
Your manager assigns you a project task, and you don’t understand the point of the project. What questions can you ask them to determine the objective? Select all that apply.
Shuffle Q/A
Scenario 1, questions 1-5
You’ve just started a job as a data analyst at a small software company that provides data analytics and business intelligence solutions. Your supervisor asks you to kick off a project with a new client, Athena’s Story, a feminist bookstore. They have four existing locations, and the fifth shop has just opened in your community.
Athena’s Story wants to produce a campaign to generate excitement for an upcoming celebration and introduce the bookstore to the community. They share some data with your team to help make the event as successful as possible.
Your task is to review the assignment and the available data, then present your approach to your supervisor. Click the link below to access the email from your supervisor:
Course 2 Scenario 1 Email from Supervisor.pdf
Then, review the email, and the Customer Survey and Historical Sales datasets.
To use the templates for the datasets, click the links below and select “Use Template.”
Links to templates: Customer Survey and Historical Sales
OR
If you don't have a Google account, you can download the CSV files directly from the attachments below.
CustomerSurvey - CustomerSurvey
HistoricalSales - HistoricalSales
After reading the email, you notice that the acronym WHM appears in multiple places. You look it up online, and the most common result is web host manager. That doesn’t seem right to you, as it doesn’t fit the context of a feminist bookstore. Still, you should assume it’s correct and continue with the project.
Scenario 1 continued
Now that you know WHM stands for Women’s History Month, you continue reviewing the datasets. You notice that the Customer Survey dataset contains both qualitative and quantitative data.
To use the template for the dataset, click the link below and select “Use Template.”
Link to template: Customer Survey
OR
If you don't have a Google account, you can download the CSV file directly from the attachment below.
CustomerSurvey - CustomerSurvey
The qualitative data includes information from which columns? Select all that apply.
Scenario 1 continued
Next, you review the customer feedback in column F of the Customer Survey dataset.
To use the template for the dataset, click the link below and select “Use Template.”
Link to template: Customer Survey
OR
If you don't have a Google account, you can download the CSV file directly from the attachment below.
CustomerSurvey - CustomerSurvey
The attribute of column F is, “Survey Q6: What types of books would you like to see more of at Athena's Story?” In order to verify that children’s literature and feminist zines are among the most popular genres, you create a visualization. This will help you clearly identify which genres are most likely to sell well during the Women’s History Month campaign.
Your visualization looks like this:
Pie chart categories: -Feminist science fiction 4.8% -Books about women 2.4% -Women's journals 2.4% -Feminist literary criticism 2.4% -Children's literature 15.5% -Women's history books 2.4% -Biographies of inspiration 20.2% -Feminist fiction 26.2% -Feminist zines 14.3% -Feminist poetry 4.6% -Feminist novels 3.6%
Pie chart categories: Feminist science fiction 4.8% Books about women 2.4% Women's journals 2.4% Feminist literary criticism 2.4% Children's literature 15.5% Women's history books 2.4% Biographies of inspiration 20.2% Feminist fiction 26.2% Feminist zines 14.3% Feminist poetry 4.6% Feminist novels 3.6%
Fill in the blank: The visualization you create demonstrates the percentages of each book genre that make up the total number of survey responses. It’s called a _____ chart.
Now that you’ve confirmed that children’s literature and feminist zines are among the most requested book genres, you review the Historical Sales dataset.
To use the template for the dataset, click the link below and select “Use Template.”
Link to template: Historical Sales
If you don't have a Google account, you can download the CSV file directly from the attachment below.
HistoricalSales - HistoricalSales
You’re pleased to see that the dataset contains data that’s specific to children’s literature and feminist zines. This will provide you with the information you need to make data-inspired decisions. In addition, the children’s literature and feminist zines metrics will help you organize and analyze the data about each genre in order to determine if they’re likely to be profitable.
Next, you calculate the total sales over 52 weeks for feminist zines. You type =CALCULATE(E2-E53) but get an error. What is the correct syntax?
Scenario 1 continued
After familiarizing yourself with the project and available data, you present your approach to your supervisor. You provide a scope of work, which includes important details, a schedule, and information on how you plan to prepare and validate the data. You also share some of your initial results and the pie chart you created.
In addition, you identify the problem type, or domain, for the data analysis project. You decide that the historical sales data can be used to provide insights into the types of books that will sell best during Women’s History Month this coming year. This will also enable you to determine if Athena’s Story should begin selling more children’s literature and feminist zines.
Using historical data to make informed decisions about how things may be in the future is an example of spotting something unusual.
Scenario 2, questions 6-10
You’ve completed this program and are now interviewing for your first junior data analyst position. You’re hoping to be hired by an event planning company, Patel Events Plus. Access the job description below:
Junior Data Analyst Job Description.pdf
So far, you’ve successfully completed the first round of interviews with the human resources manager and director of data and strategy. Now, the vice president of data and strategy wants to learn more about your approach to managing projects and clients. Access the email you receive from the human resources director below:
Human Resources Director Email.pdf
You arrive Thursday at 1:45 PM for your 2 PM interview. Soon, you’re taken into the office of Mila Aronowicz, vice president of data and strategy. After welcoming you, she begins the behavioral interview.
First, she hands you a copy of Patel Events Plus’s organizational chart. Access the chart below:
Patel Event Plus Org Chart.pdf
As you’ve learned in this course, stakeholders are people who invest time, interest, and resources into the projects you’ll be working on as a data analyst. Let’s say you’re working on a project involving data and strategy.
Based on what you find in the organizational chart, who should be considered the primary stakeholder for projects involving data and strategy?
Scenario 2 continued
Next, the vice president wants to understand your knowledge about asking effective questions. Consider and respond to the following question. Select all that apply.
Let’s say we just completed a big event for a client and wanted to find out if they were satisfied with their experience. Provide some examples of measurable questions that you could include in the customer feedback survey. Select all that apply.
Scenario 2 continued
Now, the vice president presents a situation having to do with resolving challenges and meeting stakeholder expectations. Consider and respond to the following question.
You’re working on a rush project, and you discover your dataset is not clean. Even though it has numerous nulls, redundant data, and other issues, the primary stakeholder insists that you move ahead and use it anyway. The project timeline is so tight that there simply isn’t enough time for cleaning. How would you handle that situation?
Scenario 2 continued
Your next interview question deals with sharing information with stakeholders. Consider and respond to the following question. Select all that apply.
Let’s say you’ve designed a dashboard to give stakeholders easy, automatic access to data about an upcoming event. Describe the benefits of using a dashboard. Select all that apply.
Scenario 2 continued
Your final behavioral interview question involves using metrics to answer business questions. Your interviewer hands you a copy of a Patel Events dataset.
To use the template for this dataset, click the link below and select “Use Template.”
Link to template: Patel Events Data
OR
If you don't have a Google account, you can download the CSV file directly from the attachment below.
Then, she asks: Recently, Patel Events Plus purchased a new venue for our events. If we asked you to calculate the return on investment of this purchase, the metrics to consider would be the cost of the investment and what else?
Scenario 1, questions 1-5
You’ve just started a job as a data analyst at a small software company that provides data analytics and business intelligence solutions. Your supervisor asks you to kick off a project with a new client, Athena’s Story, a feminist bookstore. They have four existing locations, and the fifth shop has just opened in your community.
Athena’s Story wants to produce a campaign to generate excitement for an upcoming celebration and introduce the bookstore to the community. They share some data with your team to help make the event as successful as possible.
Your task is to review the assignment and the available data, then present your approach to your supervisor. Click the link below to access the email from your supervisor:
Course 2 Scenario 1 Email from Supervisor.pdf
Then, review the email, and the Customer Survey and Historical Sales datasets.
To use the templates for the datasets, click the links below and select “Use Template.”
Links to templates: Customer Survey and Historical Sales
OR
If you don't have a Google account, you can download the CSV files directly from the attachments below.
CustomerSurvey - CustomerSurvey
HistoricalSales - HistoricalSales
After reading the email, you notice that the acronym WHM appears in multiple places. You look it up online, and the most common result is web host manager. That doesn’t seem right to you, as it doesn’t fit the context of a feminist bookstore. You email your supervisor to ask. When writing your email, what do you do to ensure it sounds professional? Select all that apply.
Scenario 1 continued
Now that you know WHM stands for Women’s History Month, you continue reviewing the datasets. You notice that the Customer Survey dataset contains both qualitative and quantitative data.
To use the template for the dataset, click the link below and select “Use Template.”
Link to template: Customer Survey
OR
If you don't have a Google account, you can download the CSV file directly from the attachment below.
CustomerSurvey - CustomerSurvey
The quantitative data includes information from which columns? Select all that apply.
Scenario 2, questions 6-10
You’ve completed this program and are now interviewing for your first junior data analyst position. You’re hoping to be hired by an event planning company, Patel Events Plus. Access the job description below:
Junior Data Analyst Job Description.pdf
So far, you’ve successfully completed the first round of interviews with the human resources manager and director of data and strategy. Now, the vice president of data and strategy wants to learn more about your approach to managing projects and clients. Access the email you receive from the human resources director below:
Human Resources Director Email.pdf
You arrive Thursday at 1:45 PM for your 2 PM interview. Soon, you’re taken into the office of Mila Aronowicz, vice president of data and strategy. After welcoming you, she begins the behavioral interview.
First, she hands you a copy of Patel Events Plus’s organizational chart. Access the chart below:
Patel Event Plus Org Chart.pdf
As you’ve learned in this course, stakeholders are people who invest time, interest, and resources into the projects you’ll be working on as a data analyst. Let’s say you’re working on a project involving data and strategy.
Based on what you find in the organizational chart, which individuals are considered the secondary stakeholders? Select all that apply.
Scenario 2 continued
Next, the vice president wants to understand your knowledge about asking effective questions. Consider and respond to the following question. Select all that apply.
Let’s say we just completed a big event for a client and wanted to find out if they were satisfied with their experience. Provide some examples of measurable questions that you could include in the customer feedback survey. Select all that apply.
Now, the vice president presents a situation having to do with resolving challenges and meeting stakeholder expectations. Consider and respond to the following question. Select all that apply.
You’re working with a dataset that the data analytics coordinator should have cleaned, but it turns out that it wasn’t. Your supervisor thought the dataset was ready for use, but you discover nulls, redundant data, and other issues. The project is due in less than two weeks. Which of the following options would be an appropriate approach? Select all that apply.
Scenario 2 continued
Your next interview question deals with sharing information with stakeholders. Consider and respond to the following question. Select all that apply.
Let’s say you’ve created a report to present stakeholders with information about an upcoming event. Describe the benefits of using a report. Select all that apply.
A data analyst is preparing an annual report for company executives and decides to use internal data. Why do they choose to use internal data? Select all that apply.
A data analyst is reviewing data that has been organized into a table format. What type of data is in the table?
A data analyst is reviewing a spreadsheet. They find that the columns contain the data variables. What data format does this describe?
What kind of data format does it contain?
Shuffle Q/A
Which of the following best describes data bias?
In data ethics, consent gives an individual the right to know the answers to which of the following questions? Select all that apply.
An individual who provides their data has the right to know and understand all of the data-processing activities and algorithms used on that data. This concept refers to which aspect of data ethics?
A company collects and analyzes user data. As part of this process, they preserve each data subject’s information and activity for all data transactions. What data ethics concept does this describe?
Shuffle Q/A
Which of the following properties describe primary keys in a relational database? Select all that apply.
What do metadata repositories do to make it simpler and quicker to use multiple data sources for analysis? Select all that apply.
Which type of metadata is used to indicate where a digital asset or piece of information originated from?
What is the process that data analysts use to ensure the formal management of their company’s data assets?
What are some of the reasons for open data initiatives? Select all that apply.
A nonprofit has a list of their many donors. They want to send a mailing to donors who live within 100 miles of the nonprofit’s headquarters. How could they use the column distance_to_hq to only display the donors that meet those conditions?
In the following piece of SQL code, what does the asterisk (*) represent?
SELECT * FROM customers
You are working with a database table that contains customer data. The company column lists the company affiliated with each customer. You want to find customers from the company Riotur.
You write the SQL query below.
SELECT * FROM Customer
What code would be added to return only customers affiliated with the company Riotur?
3.Think about data as a student at a high school. In this metaphor, which of the following are examples of metadata? Select all that apply.
7.When writing a query, it's necessary for the name of the dataset to be inside two backticks in order for the query to run properly.
You write the SQL query below. Add a WHERE clause that will return only customers located in Berlin.
How many customers are located in Berlin?
Shuffle Q/A
You write the SQL query below. Add a WHERE clause that will return only customers located in FL.
How many customers are located in FL?
You write the SQL query below. Add a WHERE clause that will return only customers named Mark.
How many customers are named Mark?
A data analyst has been tasked with a new project and has started to collect data from multiple sources. The analyst will be working with multiple team members on this project and needs to create a naming convention to allow projects files to be located efficiently. What should the analyst include in each file's name? Select all that apply.
Your boss assigns you a new multi-phase project and you create a naming convention for all of your files. With this project lasting years and incorporating multiple analysts it’s crucial that you create data explaining how your naming conventions are structured. What is this data called?
A data analyst creates a file that lists people who donated to their organization’s fund drive. An effective name for the file is FundDriveDonors_20210216_V01.
You have just started a new project and have created a naming convention for all of your files. Once the data has been collected you start foldering. What does the foldering process allow you to do?
A data analyst deletes an old project’s files from their active project folder. A few months later, they have to review the work that they completed on this project but cannot find the older project files. What should the data analyst have done?
As a data analyst, folder organization is key to being efficient at your job. A common practice is to lay out your folders with broad topics at the top with more specific topics at the bottom. What’s the name of this approach?
To reduce clutter, a data analyst hides cells that contain long, complex formulas. The hidden cells allow the data analyst to protect their formulas and hide the data from other users with access to the spreadsheet.
6.Data analysts create hierarchies to organize their folders. How are folder hierarchies structured?
Shuffle Q/A
Scenario 1, questions 1-5
You’ve been working at a data analytics consulting company for the past six months. Your team helps restaurants use their data to better understand customer preferences and identify opportunities to become more profitable.
To do this, your team analyzes customer feedback to improve restaurant performance. You use data to help restaurants make better staffing decisions and drive customer loyalty. Your analysis can even track the number of times a customer requests a new dish or ingredient in order to revise restaurant menus.
Currently, you’re working with a vegetarian sandwich restaurant called Garden. The owner wants to make food deliveries more efficient and profitable. To accomplish this goal, your team will use delivery data to better understand when orders leave Garden, when they get to the customer, and overall customer satisfaction with the orders.
Before project kickoff, you attend a discovery session with the vice president of customer experience at Garden. He shares information to help your team better understand the business and project objectives. As a follow-up, he sends you an email with datasets.
Click below to read the email:
C3 Scenario 1_Client Email .pdf
And click below to access the datasets:
Course 3 Final Challenge Data Sets - Customer survey data (1)
Course 3 Final Challenge Data Sets - Delivery times_distance (1)
Reviewing the data enables you to describe how you will use it to achieve your client’s goals. First, you notice that all of the data is first-party data, which means that it was collected from outside sources.
Scenario 1 continued
Next, you review the customer satisfaction survey data. To use the template for the customer satisfaction survey data, click the link below and select “Use Template.”
Link to template: Customer Satisfaction Survey data
OR
If you don’t have a Google account, download the CSV file directly from the attachment below.
CustomerSurveyData - Customer survey data
You notice that the data in column E is an example of Boolean data. Why did you come to this conclusion?
Scenario 1 continued
Now, you review the data on delivery times and the distance of customers from the restaurant.
To use the template for the dataset, click the link below and select “Use Template.”
Link to template: Delivery Times/Distance
OR
If you don’t have a Google account, download the CSV file directly from the attachment below.
DeliveryTimes_DistanceData - Delivery times_distance
The data in column D is an example of nominal data.
Scenario 2 continued
Consider and respond to the following question. Select all that apply.
Our data analytics team often uses both internal and external data. Describe the difference between the two.
Scenario 2 continued
For your final question, your interviewer explains that Sewati Financial Services needs its clients’ trust, and this is an important responsibility for the data analytics team.
He asks you to identify which data analytics practice involves preserving a data subject’s information and activity any time a data transaction occurs.
You’ve been working at a data analytics consulting company for the past six months. Your team helps restaurants use their data to better understand customer preferences and identify opportunities to become more profitable.
To do this, your team analyzes customer feedback to improve restaurant performance. You use data to help restaurants make better staffing decisions and drive customer loyalty. Your analysis can even track the number of times a customer requests a new dish or ingredient in order to revise restaurant menus.
Currently, you’re working with a vegetarian sandwich restaurant called Garden. The owner wants to make food deliveries more efficient and profitable. To accomplish this goal, your team will use delivery data to better understand when orders leave Garden, when they get to the customer, and overall customer satisfaction with the orders.
Before project kickoff, you attend a discovery session with the vice president of customer experience at Garden. He shares information to help your team better understand the business and project objectives. As a follow-up, he sends you an email with datasets.
Reviewing the data enables you to describe how you will use it to achieve your client’s goals. First, you notice that all of the data was collected by Garden employees using their own resources. What type of data does this describe?
Scenario 1 continued
The next thing you review is the file containing pictures of sandwich deliveries over a period of 30 days. This is unstructured data, which means what?
Next, you review the customer satisfaction survey data. To use the template for the customer satisfaction survey data, click the link below and select “Use Template.”
Link to template: Customer Satisfaction Survey data
OR
If you don’t have a Google account, download the CSV file directly from the attachment below.
The question in column E asks, “Was your order accurate? Please respond yes or no.” The responses listed in column E are an example of Boolean data.
Now, you review the data on delivery times and the distance of customers from the restaurant.
To use the template for the dataset, click the link below and select “Use Template.”
Link to template: Delivery Times/Distance
OR
If you don’t have a Google account, download the CSV file directly from the attachment below.
The data in column E shows the duration of deliveries from Garden to customers. What type of data is this? Select all that apply.
The next thing you review is the file containing pictures of sandwich deliveries over a period of 30 days. This is an example of structured data.
Now that you’re familiar with the data, you want to build trust with the team at Garden. You decide to impress them by taking the initiative to reach out to your social media followers. You explain that Garden is a new client, and you show them the pictures of Garden’s sandwich deliveries from the client file. Then, you ask them if they have any photos of sandwich deliveries that you can evaluate.
This is an example of going above and beyond expectations and a great way to build trust.
You’ve completed this program and are interviewing for a junior data scientist position at a company called Sewati Financial Services.
So far, you’ve successfully completed the first interview with a recruiter. They arrange your second interview with the team at Sewati Financial Services.
You arrive 15 minutes early for your interview. Soon, you are escorted into a conference room, where you meet Kai Harvey, the senior manager of strategy. After welcoming you, he begins the behavioral interview.
Consider and respond to the following question. Select all that apply.
Our data analytics team often surveys clients to get their feedback. If you were on the team, how would you ensure the process does not cause potential bias?
Consider and respond to the following question. Select all that apply.
Our data analytics team often uses external data. Where can you access useful external data?
Consider and respond to the following question. Select all that apply.
Our analysts often work within the same spreadsheet, but for different purposes. What tools would you use in such a situation?
Next, your interviewer wants to better understand your knowledge of basic SQL commands. He asks: How would you write a query that retrieves only data about people who work in Boise from the Clients table in our database?
For your final question, your interviewer explains that Sewati Financial Services cares about data privacy. The company needs its clients’ trust, and this is an important responsibility for the data analytics team.
He asks: What does data privacy involve? Select all that apply.
Shuffle Q/A
You’ve been working at a data analytics consulting company for the past six months. Your team helps restaurants use their data to better understand customer preferences and identify opportunities to become more profitable.
To do this, your team analyzes customer feedback to improve restaurant performance. You use data to help restaurants make better staffing decisions and drive customer loyalty. Your analysis can even track the number of times a customer requests a new dish or ingredient in order to revise restaurant menus.
Currently, you’re working with a vegetarian sandwich restaurant called Garden. The owner wants to make food deliveries more efficient and profitable. To accomplish this goal, your team will use delivery data to better understand when orders leave Garden, when they get to the customer, and overall customer satisfaction with the orders.
Before project kickoff, you attend a discovery session with the vice president of customer experience at Garden. He shares information to help your team better understand the business and project objectives. As a follow-up, he sends you an email with datasets.
Click below to read the email:
And click below to access the datasets:
Reviewing the data enables you to describe how you will use it to achieve your client’s goals. First, you notice that all of the data is first-party data. What does this mean?
Next, you review the customer satisfaction survey data. To use the template for the customer satisfaction survey data, click the link below and select “Use Template.”
Link to template: Customer Satisfaction Survey data
OR
If you don’t have a Google account, download the CSV file directly from the attachment below.
You notice that the data in column E is an example of Boolean data. Why did you come to this conclusion?
Now, you review the data on delivery times and the distance of customers from the restaurant.
To use the template for the dataset, click the link below and select “Use Template.”
Link to template: Delivery Times/Distance
OR
If you don’t have a Google account, download the CSV file directly from the attachment below.
Fill in the blank: The data in column E is an example of _____ data. Select all that apply.
The next thing you review is the file containing pictures of sandwich deliveries over a period of 30 days. What type of data is this?
Now that you’re familiar with the data, you want to build trust with the team at Garden.
What actions should you take when working with their data? Select all that apply.
You’ve completed this program and are interviewing for a junior data scientist position at a company called Sewati Financial Services.
Click below to review the job description:
So far, you’ve successfully completed the first interview with a recruiter. They arrange your second interview with the team at Sewati Financial Services.
Click below to read the email from the human resources director:
You arrive 15 minutes early for your interview. Soon, you are escorted into a conference room, where you meet Kai Harvey, the senior manager of strategy. After welcoming you, he begins the behavioral interview.
Consider and respond to the following question. Select all that apply.
Our data analytics team often surveys clients to get their feedback. If you were on the team, how would you ensure the sample is representative of the population as a whole?
Next, your interviewer wants to better understand your knowledge of basic SQL commands. He asks: How would you write a query that retrieves only data about people who joined our firm in 2019 from the Clients table in our database?
For your final question, your interviewer explains that Sewati Financial Services cares about its clients’ trust, and this is an important responsibility for the data analytics team. They do this by:
protecting clients from unauthorized access to their private data
ensuring freedom from inappropriate use of client data
getting consent to use someone’s data
He asks: Which data analytics practice does this describe?
Next, you review the customer satisfaction survey data. To use the template for the customer satisfaction survey data, click the link below and select “Use Template.”
Link to template: Customer Satisfaction Survey data
OR
If you don’t have a Google account, download the CSV file directly from the attachment below.
The question in column E asks, “Was your order accurate? Please respond yes or no.” What kind of data is this?
The next thing you review is the file containing pictures of sandwich deliveries over a period of 30 days. This is unstructured data, which means what?
Consider and respond to the following question. Select all that apply.
Our data analytics team often uses external data. Where can you locate useful external data?
Consider and respond to the following question.
Our analysts often work within the same spreadsheet, but for different purposes. How could filtering help in this situation?
Fill in the blank: As a data analyst, you need to verify that your data is _____ to ensure your analysis and conclusions are accurate.
A data analyst is given a dataset for analysis. It includes data only about the total population of every country in the previous 20 years. Based on the available data, an analyst would have the full picture and be able to determine the reasons behind a certain country's population increase from 2016 to 2017.
A data analyst is given a dataset for analysis. To use the template for this dataset, click the link below and select “Use Template.”
Link to template: June 2014 Invoices
OR
If you don’t have a Google account, download the CSV file directly from the attachment below.
June 2014 Invoices - Sheet1
The analyst notices a limitation with the data in rows 8 and 9. What is the limitation?
A data analyst is working on a project about the global supply chain. They have a dataset with lots of relevant data from Europe and Asia. However, they decide to generate new data that represents all continents. What type of insufficient data does this scenario describe?
In the data analysis process, how does a sample relate to a population?
A restaurant wants to gather data about a new dish by giving out free samples and asking for feedback. Who should the restaurant give samples to?
Link to template: June 2014 Invoices
OR
If you don’t have a Google account, download the CSV file directly from the attachment below.
Which of the following has duplicate data?
7.Fill in the blank: Sampling bias in data collection happens when a sample isn’t representative of _____.
Shuffle Q/A
Link to template: June 2014 Invoices
OR
If you don’t have a Google account, download the CSV file directly from the attachment below.
June 2014 Invoices - Sheet1
The data analyst is asked to find the average estimate for Symteco over the past three years. What limitation of the data makes this impossible?
Link to template: June 2014 Invoices
OR
If you don’t have a Google account, download the CSV file directly from the attachment below.
June 2014 Invoices - Sheet1
Which of the following are limitations of this dataset?
Fill in the blank: Conditional formatting is a spreadsheet tool that changes how _____ appear when values meet a specific condition.
For a function to work properly, data analysts must follow each function’s predetermined structure. What is this structure called?
An analyst is cleaning a new dataset. They want to make sure the data contained from cell B2 through cell B100 does not contain a number smaller than 10. Which COUNTIF function syntax can be used to answer this question?
VLOOKUP searches for a value in a row in order to return a corresponding piece of information.
To evaluate how well two or more data sources work together, data analysts use data mapping.
2.A data analyst at a nonprofit organization is working with the following spreadsheet, which contains member name data in column C. They want to divide this data using the underscore as a delimiter, so that first names are stored in one column and last names in another. Which tool should the analyst use?
3.Fill in the blank: When describing a SUM function, the _____ is =SUM(value 1 through value 2).
4.You are working with the following selection of a spreadsheet:
In order to extract the five-digit postal code from Burlington, MA, what is the correct function?
They want to create employee identification numbers (IDs) in column D. The IDs should include the year hired plus the last four digits of the employee’s Social Security Number (SS#). What function will create the ID 20093208 for the employee in row 5?
Shuffle Q/A
They want to create employee identification numbers (IDs) in column D. The IDs should include the last four digits of the employee’s Social Security Number(SS#) plus the year hired. What function will create the ID 19392020 for the employee in row 4?
In this context, what is the underscore acting as?
In order to extract the five-digit postal code from Brandon, FL, what is the correct function?
They want to create employee identification numbers (IDs) in column D. The IDs should include the last four digits of the employee’s Social Security Number(SS#) plus the year hired. What function will create the ID 32082009 for the employee in row 5?
In which of the following situations would a data analyst use spreadsheets instead of SQL? Select all that apply.
In SQL databases, what data type is the value 78.99 an example of?
2.What are some of the benefits of using SQL for analysis? Select all that apply.
You write the SQL query below. Add a DISTINCT clause that will remove duplicate entries from the customer_id column.
NOTE: The three dots (...) indicate where to add the clause.
What customer ID number appears in row 12 of your query result?
You write the SQL query below. Add a LENGTH function that will return any postal codes that are greater than 7 characters long.
What is the last name of the customer that appears in row 10 of your query result?
You write the SQL query below. Add a statement to your SQL query that will retrieve the first 3 characters of each postal code and store the result in a new column as new_postal_code.
NOTE: The three dots (...) indicate where to add the statement.
What employee ID number appears in row 5 of your query result?
NOTE: The query index starts at 1 not 0.
Shuffle Q/A
You write the SQL query below. Add a DISTINCT clause that will remove duplicate entries from the customer_id column.
NOTE: The three dots (...) indicate where to add the clause.
What is the total number of customers in the database?
You write the SQL query below. Add a statement to your SQL query that will retrieve the first 2 letters of each state name and store the result in a new column as new_state.
NOTE: The three dots (...) indicate where to add the statement.
NOTE: SUBSTR takes in three arguments being column, starting_index, ending_index
What customer ID number is in row 9 of your query result?
NOTE: The query index starts at 1 not 0.
You write the SQL query below. Add a DISTINCT clause that will remove duplicate entries from the billing_state column.
NOTE: The three dots (...) indicate where to add the clause.
What billing state appears in row 17 of your query result?
NOTE: The query index starts at 1 not 0.
You write the SQL query below. Add a LENGTH function that will return any city names that are greater than 9 characters long.
What is the first name of the customer that is in row 7 of your query result?
NOTE: The query index starts at 1 not 0.
You write the SQL query below. Add a LENGTH function that will return any state names that are greater than 3 characters long.
What state is in row 1 of your query result?
NOTE: The query index starts at 1 not 0.
You write the SQL query below. Add a LENGTH function that will return any state names that are greater than 2 characters long.
What country is in row 1 of your query result?
NOTE: The query index starts at 1 not 0.
You write the SQL query below. Add a statement to your SQL query that will retrieve the first 3 characters of each last_name and store the result in a new column as new_last_name.
NOTE: The three dots (...) indicate where to add the statement.
NOTE: SUBSTR takes in three arguments being column, starting_index, ending_index
What employee ID number is in row 8 of your query result?
NOTE: The query index starts at 1 not 0.
What is involved in seeing the big picture when verifying data cleaning? Select all that apply.
Fill in the blank: A data analyst uses the CASE statement to consider one or more _____, then return a value.
A data analyst uses a changelog to record how the data evolves while cleaning their data. What data cleaning best practice does this describe?
Shuffle Q/A
SELECT
_____
WHEN ‘World Wide Web’ THEN ‘www’
END AS some_column
FROM
some_table
Scenario 2, questions 6-10
You’ve completed this program and are interviewing for a junior data scientist position. The job is at B.Spoke Market Research, a company that analyzes market conditions using customer surveys and other research methods. The detailed job description can be found below:
C4 B.Spoke Market Research Job Description.pdf
So far, you’ve had a phone interview with a recruiter and you’ve secured a second interview with the B.Spoke team. The recruiter’s email can be found below:
C4 S2 Email from Recruiter.pdf
You arrive 15 minutes early for your interview. Soon, you are escorted into a conference room, where you meet Jodie Choi, the data science lead. After welcoming you, the behavioral interview begins.
For your first question, your interviewer wants to learn about your experience with spreadsheets. She says: Sometimes the team needs data that is stored in different spreadsheets. So, we use a spreadsheet function to find the information we need.
There is a spreadsheet function that allows a data analyst to search for a value in the first column of a given range and return the value of a specified cell in the row in which it is found. What function allows you to complete these tasks?
You are a data analyst at a small analytics company. Your company is hosting a project kick-off meeting with a new client, Meer-Kitty Interior Design. The agenda includes reviewing their goals for the year, answering any questions, and discussing their available data.
Before the meeting you review the About Us tab on their website and their business plan, linked below:
Meer-Kitty Interior Design has two goals. They want to expand their online audience, which means getting their company and brand known by as many people as possible. They also want to launch a line of high-quality indoor paint to be sold in-store and online. You decide to consider the data about indoor paint first.
To use the template for the survey feedback, click the link below and select “Use Template.”
Link to template: Kitty Survey Feedback
OR
If you don’t have a Google account, download the file directly from the attachment below.
When you refer to the Meer-Kitty survey feedback tab, you are pleased to find that the available data is aligned to the business objective. However, you do some research about confidence level for this type of survey and learn that you need at least 120 unique responses for the survey results to be useful. Therefore, the dataset has two limitations: First, there are only 40 responses; second, a Meer-Kitty superfan, User 588, completed the survey 11 times.
As the survey has too few responses and numerous duplicates that are skewing results, what are your options? Select all that apply.
During the meeting, you also learn that Meer-Kitty videos are hosted on their website. For each product offered, there is an accompanying video for customers to learn more. So, more views for a video suggests greater consumer interest.
Your goal is to identify which videos are most popular, so Meer-Kitty knows what topics to explore in the future. Unfortunately, Meer-Kitty has just three months of data available because they only recently launched the videos on their site.
Without enough data to identify long-term trends about the video subjects that people prefer, what should you do?
Now that you’ve identified some limitations with Meer-Kitty’s data, you want to communicate your concerns to stakeholders. In addition to insufficient video trend data, your main concern with the indoor paint survey is that the data isn’t representative of the population as a whole.
Clearly, one particular respondent, the superfan, is overrepresented. This is an example of margin of error.
The stakeholders understand your concerns and agree to repeat the indoor paint survey. In a few weeks, you have a much better dataset with more than 150 responses and no duplicates.
To use the template for the survey feedback, click the link below and select “Use Template.”
Link to template: Kitty Survey Feedback
Or, if you don’t have a Google account, download the file directly from the attachment below.
If you are using the template, please refer to the New Meer-Kitty survey feedback tab. You notice that questions 4 and 5 are dependent on the respondent’s answer to question 3. So, you need to determine how many people answered Yes to question 3, then compare that to responses to questions 4 and 5. That way, you will know if questions 4 and 5 have any nulls.
You decide to use a spreadsheet tool that changes how cells appear when they contain the word Yes. Which tool do you use?
You have finished cleaning the data to ensure it is complete, correct, and relevant to the problem you’re trying to solve. Then, you complete the verification and reporting processes to share the details of your data-cleaning effort with your team.
Your team notes one aspect of data cleaning that would help improve the dataset. They point out that the new survey also has a new question in Column G: “What are your favorite indoor paint colors?” This was a free-response question, so respondents typed in their answers. Some people included multiple different colors of paint. In order to determine which colors are most popular, it will be necessary to put each color in its own cell.
You use a spreadsheet function to divide the text strings in Column G around the commas and put each fragment into a new, separate cell. In this example, what are the commas called?
You’ve completed this program and are interviewing for a junior data scientist position. The job is at B.Spoke Market Research, a company that analyzes market conditions using customer surveys and other research methods. The detailed job description can be found below:
So far, you’ve had a phone interview with a recruiter and you’ve secured a second interview with the B.Spoke team. The recruiter’s email can be found below:
You arrive 15 minutes early for your interview. Soon, you are escorted into a conference room, where you meet Jodie Choi, the data science lead. After welcoming you, the behavioral interview begins.
For your first question, your interviewer wants to learn about your experience with spreadsheets. She says: Sometimes the team needs data that is stored in different spreadsheets. So, we use spreadsheet functions to help us find the information we need.
What function would you use to search for a certain value in a spreadsheet column to return the corresponding piece of information?
Next, your interviewer wants to know more about your understanding of tools that work in both spreadsheets and SQL queries. She explains that the data her team receives from customer surveys sometimes has many duplicate entries.
She says: Spreadsheets have a great tool for that called remove duplicates. But when writing a SQL query, what command should you include in your SELECT statement to remove duplicates?
Now, your interviewer explains that the data team usually works with very large amounts of customer survey data. After receiving the data, they import it into a SQL table. But sometimes, the new dataset imports incorrectly and they need to change the format.
She asks: What function would you use to convert data in a SQL table from one datatype to another?
Next, your interviewer explains that one of their clients is an online retailer that has a vast inventory. She has a list of items by name, color, and size. Then, she has another list of the price of each item by size, as a larger item sometimes costs more. The stakeholder needs one list of all items by name, color, size, and price.
She then says: In situations such as this one, could you use the CONCAT function to add strings together to create new text strings?
For your final question, your interviewer explains that her team often comes across data with extra leading or trailing spaces.
She asks: Which function would enable you to eliminate those extra spaces? You respond: To eliminate extra spaces for consistency, use the TRIM function.
Shuffle Q/A
You are a data analyst at a small analytics company. Your company is hosting a project kick-off meeting with a new client, Meer-Kitty Interior Design. The agenda includes reviewing their goals for the year, answering any questions, and discussing their available data.
Before the meeting you review the About Us tab on their website and their business plan, linked below:
Meer-Kitty Interior Design has two goals. They want to expand their online presence, which means getting their company and brand known by as many people as possible. They also want to launch a line of high-quality indoor paint to be sold in-store and online. You decide to consider the data about indoor paint first.
To use the template for the survey feedback, click the link below and select “Use Template.”
Link to template: Kitty Survey Feedback
OR
If you don’t have a Google account, download the file directly from the attachment below.
When you refer to the Meer-Kitty survey feedback tab, you are pleased to find that the available data is aligned to the business objective. However, you do some research about confidence level for this type of survey and learn that you need at least 120 unique responses for the survey results to be useful. Therefore, the dataset has two limitations: First, there are only 40 responses; second, a Meer-Kitty superfan, User 588, completed the survey 11 times.
As the survey has too few responses and numerous duplicates that are skewing results, you should remove the duplicates and continue analyzing the remaining 29 responses.
During the meeting, you also learn that Meer-Kitty videos are hosted on their website. For each product offered, there is an accompanying video for customers to learn more. So, more views for a video suggests greater consumer interest.
Your goal is to identify which videos are most popular, so Meer-Kitty knows what topics to explore in the future. Unfortunately, Meer-Kitty has just three months of data available because they only recently launched the videos on their site.
Without enough data to identify long-term trends about the video subjects that people prefer, what are your available options? Select all that apply.
The stakeholders understand your concerns and agree to repeat the indoor paint survey. In a few weeks, you have a much better dataset with more than 150 responses and no duplicates.
To use the template for the survey feedback, click the link below and select “Use Template.”
Link to template: Kitty Survey Feedback
OR
If you don’t have a Google account, download the file directly from the attachment below.
If you are using the template, please refer to the New Meer-Kitty survey feedback tab. You notice that questions 4 and 5 are dependent on the respondent’s answer to question 3. So, you need to determine how many people answered Yes to question 3, then compare that to responses to questions 4 and 5. That way, you will know if questions 4 and 5 have any nulls.
You decide to use a spreadsheet tool that changes how cells appear when they contain the word Yes. When using this tool, what is the word Yes?
You’ve completed this program and are interviewing for a junior data scientist position. The job is at B.Spoke Market Research, a company that analyzes market conditions using customer surveys and other research methods. The detailed job description can be found below:
So far, you’ve had a phone interview with a recruiter and you’ve secured a second interview with the B.Spoke team. The recruiter’s email can be found below:
You arrive 15 minutes early for your interview. Soon, you are escorted into a conference room, where you meet Jodie Choi, the data science lead. After welcoming you, the behavioral interview begins.
For your first question, your interviewer wants to learn about your experience with spreadsheets. She says: Sometimes the team needs data that is stored in different spreadsheets. So, we use a spreadsheet function to find the information we need.
There is a spreadsheet function that searches for a value in the first column of a given range and returns the value of a specified cell in the row in which it is found. It is called SEARCH.
Next, your interviewer wants to know more about your understanding of tools that work in both spreadsheets and SQL. She explains that the data her team receives from customer surveys sometimes has many duplicate entries.
She says: Spreadsheets have a great tool for that called remove duplicates. Does this mean the team has to remove the duplicate data in a spreadsheet before transferring data to our database?
Now, your interviewer explains that the data team usually works with very large amounts of customer survey data. After receiving the data, they import it into a SQL table. But sometimes, the new dataset imports incorrectly and they need to change the format.
She asks: Is there a SQL function that can convert data types such as currency, dates, and times in a SQL table?
Next, your interviewer explains that one of their clients is an online retailer that needs to create product numbers for a vast inventory. Her team does this by combining the text strings for product number, manufacturing date, and color.
She asks: If you encountered a situation where you wanted to add strings together to create new text strings, which SQL function would you use?
For your final question, your interviewer explains that her team often comes across data with extra leading or trailing spaces.
She asks: Which SQL function enables you to eliminate those extra spaces for consistency?
Now that you’ve identified some limitations with Meer-Kitty’s data, you want to communicate your concerns to stakeholders. In addition to insufficient video trend data, your main concern with the indoor paint survey is that the data isn’t representative of the population as a whole.
Clearly, one particular respondent, the superfan, is overrepresented. This means the data doesn’t represent the population as a whole.
When surveying people for Meer-Kitty in the future, what are some best practices you can use to address some of the issues associated with sampling bias? Select all that apply.
You have finished cleaning the data to ensure it is complete, correct, and relevant to the problem you’re trying to solve. Then, you complete the verification and reporting processes to share the details of your data-cleaning effort with your team.
Your team notes one aspect of data cleaning that would help improve the dataset. They point out that the new survey also has a new question in Column G: “What are your favorite indoor paint colors?” This was a free-response question, so respondents typed in their answers. Some people included multiple different colors of paint. In order to determine which colors are most popular, it will be necessary to put each color in its own cell.
You decide to use a spreadsheet function to divide the text strings in Column G around the commas and put each fragment into a new, separate cell. You are using the SPLIT function.
Next, your interviewer wants to know more about your understanding of tools that work in both spreadsheets and SQL. She explains that the data her team receives from customer surveys sometimes has many duplicate entries.
She says: Spreadsheets have a great tool for that called remove duplicates. In SQL, you can include DISTINCT to do the same thing. In which part of the SQL statement do you include DISTINCT?
Now, your interviewer explains that the data team usually works with very large amounts of customer survey data. After receiving the data, they import it into a SQL table. But sometimes, the new dataset imports incorrectly and they need to change the format.
She asks: Is there a command or function that converts data in a SQL table from one datatype to another? You respond: Yes, it’s the CAST function.
Next, your interviewer explains that one of their clients is an online retailer that has a vast inventory. She has a list of items by name, color, and size. Then, she has another list of the price of each item by size, as a larger item sometimes costs more. The client needs one list of all items by name, color, size, and price.
She then asks: If you were to use the CONCAT function to complete this task, what would it enable you to do?
For your final question, your interviewer explains that her team often uses the TRIM function when writing SQL queries.
She asks: What is the TRIM function used for in SQL?
You are a data analyst at a small analytics company. Your company is hosting a project kick-off meeting with a new client, Meer-Kitty Interior Design. The agenda includes reviewing their goals for the year, answering any questions, and discussing their available data.
Before the meeting you review the About Us tab on their website and their business plan, linked below:
Meer-Kitty Interior Design has two goals. They want to expand their online audience, which means getting their company and brand known by as many people as possible. They also want to launch a line of high-quality indoor paint to be sold in-store and online. You decide to consider the data about indoor paint first.
To use the template for the survey feedback, click the link below and select “Use Template.”
Link to template: Kitty Survey Feedback
OR
If you don’t have a Google account, download the file directly from the attachment below.
When you refer to the Meer-Kitty survey feedback tab, you are pleased to find that the available data is aligned to the business objective. However, you do some research about confidence level for this type of survey and learn that you need at least 120 unique responses for the survey results to be useful. Therefore, the dataset has two limitations: First, there are only 40 responses; second, a Meer-Kitty superfan, User 588, completed the survey 11 times.
As the survey has too few responses and numerous duplicates that are skewing results, you decide to repeat the survey in order to create a new, improved dataset. What is your first step?
Now that you’ve identified some limitations with Meer-Kitty’s data, you want to communicate your concerns to stakeholders. In addition to insufficient video trend data, your main concern with the indoor paint survey is that the data isn’t representative of the population as a whole.
Clearly, one particular respondent, the superfan, is overrepresented. What does this situation describe?
The stakeholders understand your concerns and agree to repeat the indoor paint survey. In a few weeks, you have a much better dataset with more than 150 responses and no duplicates.
To use the template for the survey feedback, click the link below and select “Use Template.”
Link to template: Kitty Survey Feedback
OR
If you don’t have a Google account, download the file directly from the attachment below.
If you are using the template, please refer to the New Meer-Kitty survey feedback tab. You notice that questions 4 and 5 are dependent on the respondent’s answer to question 3. So, you need to determine how many people answered Yes to question 3, then compare that to responses to questions 4 and 5. That way, you will know if questions 4 and 5 have any nulls.
You decide to use a spreadsheet tool that changes how cells appear when they meet a certain value — in this case, the word Yes. You are using VLOOKUP.
Which of the following tasks would a data analyst perform during the analyze phase of the data analysis process? Select all that apply.
A data analyst working on a dataset performs several calculations with the data. What phase of analysis is the analyst in?
A data analyst is sorting spreadsheet data. What tool should they use to make sure that the data across rows is kept together when they rearrange the data?
A data analyst sorts a spreadsheet range between cells A15 and G71. They sort in ascending order by the second column, Column B. What is the syntax they are using?
You write the SQL query below. Add a WHERE clause that will return only data about the genre with Id number 7.
Who is the composer listed in row 4 of your query result?
You write the SQL query below. Add an ORDER BY clause that will sort the invoices by order total in ascending order.
What total appears in row 4 of your query result?
Shuffle Q/A
You write the following SQL query, but it is incorrect. What is wrong with the query?
SELECT *
FROM Track
WHERE composer = Miles Davis
You write the SQL query below. However this query is incorrect. What is wrong with it?
SELECT *
FROM invoice
WHERE billing_city = “Paris”
ORDER total
You write the following SQL query, but it is incorrect. What is wrong with the query?
SELECT *
FROM Track
WHERE album = 3
You write the SQL query below. However this query is incorrect. What is wrong with it?
SELECT *
FROM invoice
WHERE billing_city = “Delhi”
ORDER BY order_total
You write the following SQL query, but it is incorrect. What is wrong with the query?
SELECT *
FROM Track
WHERE composer = 2
You are responsible for maintaining the integrity of a dataset. Multiple analysts are working with this spreadsheet. What spreadsheet tool can you use to ensure that accidental changes are not recorded in the data?
You are working with a SQL database with tables for flight routes in Canada. The table contains one column with the names of the departure airports. A different column in the same table contains the names of the arrival airports. What function can you use in your query to combine the arrival and departure airport names into a new column?
You are querying a database of ice cream flavors to determine which stores are selling the most mint chip. For your project, you only need the first 80 records. What clause should you add to the following SQL query?
SELECT flavors FROM ice_cream_table WHERE flavor = “mint_chip”
SELECT museums
FROM museum_table
WHERE exhibit = “sculpture”
Shuffle Q/A
SELECT fast_food
FROM restaurant_table
WHERE location = “Texas”
When using VLOOKUP, there are some common limitations that data analysts should be aware of. One of these limitations is that VLOOKUP only returns the first match it finds, even if there are many possible matches within the column.
To search for the growth in population in Indonesia, what is the correct VLOOKUP syntax?
Shuffle Q/A
To search for the population of Bangladesh, what is the correct VLOOKUP syntax?
FROM fiction_table
RIGHT JOIN
books_table
What does this function do?
To search for the population of Brazil, what is the correct VLOOKUP syntax?
To search for the population of Pakistan, what is the correct VLOOKUP syntax?
FROM music_table
LEFT JOIN
Entertainment_table
What does this function do?
A data analyst uses the following formula to calculate a new column in a SQL query. What best describes the result of the formula?
(colA + colB) / colC = new_col
Link to template: Sample Transaction Table.
Or, if you don’t have a Google account, download the file directly from the attachment below.
The syntax of which of the following formulas would allow the analyst to count purchase sizes of two or more?
=SUMIF(A1:A25, ”<10”, C1:C25)
Which part of this formula is the criteria or condition?
=SUMPRODUCT(A2:A10,B2:B10)
How does the SUMPRODUCT function calculate the cell ranges identified in the parentheses?The analyst wants to figure out the value of all of the items in the spreadsheet. Which formula will calculate the total price of all of the items?
Link to template: Movie Data Project.
Or, if you don’t have a Google account, download the file directly from the attachment below.
If you want to summarize the data using the AVERAGE function in the Values menu, which spreadsheet columns could you add data from? Select all that apply.
Add a statement to your SQL query that calculates the duration in seconds for each track and stores it in a new column as secs.
NOTE: The three dots (...) indicate where to add the statement.
What is the duration in seconds of the track with Id number 3408?
You write the SQL query below. Add a GROUP BY clause that will group the data by album Id number.
What is the total duration of the album with Id number 2?
Add a statement to your SQL query that calculates the average total and stores it in a new column as average_total.
NOTE: The three dots (...) indicate where to add the statement.
What is the average total for Wisconsin?
Shuffle Q/A
=SUMIF(D2:D10,”>=50”,E2:E10)
Which part of this formula indicates the range of values to be added?
SELECT
Yes_Responses,
No_Responses,
Total_Surveys,
Yes_Responses + No_Responses AS Responses_Per_Survey
FROM
Survey_1
(colA + colB) / colC = new_col
You notice that many cells in the city column, Column K, are missing a value. So, you use the zip codes to research the correct cities. Now, you want to add the cities to each donor’s row. However, you are concerned about making a mistake, such as a spelling typo.
What spreadsheet tool allows you to control what can and cannot be entered in your worksheet in order to avoid typos?
Your database contains people who live in many areas of Wyoming. However, it’s important to align your in-house data with the data from Food Justice Rock Springs. You also need to separate your data into the two lists: Donation_Form_List and Postcard_List. They will be based on each city’s distance from Rock Springs.
What SQL function do you use to select all data from the Donation_Form_List organized by zip code?
You finish cleaning your datasets, so you decide to review Tayen’s email one more time to make sure you completed the task fully. It’s a good thing you checked because you forgot to identify people who have served on the board of directors or board of trustees. She wants to write them a thank-you note, so you need to locate them in the database.
To retrieve only those records that include people who have served on the board of trustees or on the board of directors, you use the WHERE function. Which of the following SQL queries would return the needed information?
FROM Donation_Form_List
WHERE Board_Member != 'True' OR Trustee != 'True'
FROM Donation_Form_List
WHERE Board_Member != 'True' AND Trustee != 'True'
FROM Donation_Form_List
WHERE Board_Member = 'True' OR Trustee = "True"
FROM Donation_Form_List
WHERE Board_Member = 'True' AND Trustee = 'True'
Tayen informs you that she’s thinking about inviting anyone who donated at least $100 in 2018, as well. However, she only has five open spaces. She asks you to report how many people gave at least $100 so she can determine if they can also be invited to the event.
Which spreadsheet function do you use to count how many donations of $100 or greater appear in Column O (Contributions 2018)?
For the past six months, you have been working for a direct-mail marketing firm as a junior marketing analyst. Direct mail is advertising material sent to people through the mail. These people can be current or prospective customers, clients, or donors. Many charities depend on direct mail for financial support.
Your company, Directly Dynamic, creates direct-mail pieces with its in-house staff of graphic designers, expert mail list services, and on-site printing. Your team has just been hired by a local nonprofit, Food Justice Rock Springs. The mission of Food Justice Rock Springs is to eliminate food deserts by establishing local gardens, providing mobile pantries, educating residents, and more. Click below to read the email from Tayen Bell, vice president of marketing and outreach.
You begin by reviewing the dataset. To use the template for this dataset, click the link below and select “Use Template.”
Link to template: Dynamic Dataset
Or, if you don’t have a Google account, download the file directly from the attachment below.
The client has asked you to send two separate mailings: one to people within 50 miles of Rock Springs; the other to anyone outside that area. So, to research each donor’s distance from the city, you first need to find out where all of these people live.
You could scroll through 209 rows of data, but you know there is a more efficient way to organize the cities.
Which of the following procedures will enable you to sort your spreadsheet by city (Column K) in ascending order? Select all that apply.
You notice that many cells in the city column, Column K, are missing a value. So, you use the zip codes to research the correct cities. Now, you want to add the cities to each donor’s row. However, you are concerned about making a mistake, such as a spelling typo.
Fill in the blank: To add drop-down lists to your worksheet with predetermined options for each city name, you decide to use _____.
Now, you decide to address Tayen’s request to include a handwritten note in the direct-mail piece for anyone who gave at least $100 last year.
Which of the following spreadsheet tools will enable you to change how cells appear if they contain a value of $100 or more?
At this point, you notice that the information about state and zip code is in the same cell. However, your company’s mailing list software requires states to be on a separate line from zip codes.
To move the 5-digit zip code in cell L2 into its own column, you use the function =LEFT(L2,5).
Next, you duplicate your dataset twice using the Sheet Menu. You rename the first sheet Donation Form List, and you remove the cities that are further than 50 miles from Rock Springs. You rename the second sheet Postcard List, and you remove the cities that are within 50 miles of Rock Springs.
Then, you import these datasets into your company’s mailing list database. In a mailing list database, you create two tables: Donation_Form_List and Postcard_List. You decide to clean the Donation_Form_List first.
Your company’s mailing list software requires units to be on the same line as street addresses. However, they are currently in two separate columns (street_address and unit).
What portion of your SQL statement will instruct the database to combine these two columns into a new column called “address”?
Your database contains people who live across Wyoming. However, it’s important to align your in-house data with the data from Food Justice Rock Springs. You also need to separate your data into the two lists: Donation_Form_List and Postcard_List. They will be based on each city’s distance from Rock Springs.
The zip codes are in a column called zip_code. What query do you use to select all data from the Donation_Form_List organized by zip code?
You finish cleaning your datasets, so you decide to review Tayen’s email one more time to make sure you completed the task fully. It’s a good thing you checked because you forgot to identify people who have served on the board of directors or board of trustees. She wants to write them a thank-you note, so you need to locate them in the database.
To retrieve only those records that include people who have served on the board of trustees or on the board of directors, you use the WHERE function. The syntax is:
Your company’s direct-mail campaign was very successful, and Food Justice Rock Springs has continued partnering with Directly Dynamic. One thing you’ve been working on is assigning all donors identification numbers. This will enable you to clean and organize the lists more effectively.
Meanwhile, another team member has been creating a prospect list that contains data about people who have indicated interest in getting involved with Food Justice Rock Springs. These people are also assigned a unique ID. Now, you need to compare your donor list with the dataset in your database and collect certain data from both.
What SQL function will return records with matching values in both tables?
Your next task is to identify the average contribution given by donors over the past two years. Tayen will use this information to set a donation minimum for inviting donors to an upcoming event.
You have performed the calculations for 2019, so now you move on to 2020. To return average contributions in 2020 (contributions_2020), you use the AVG function. You use the following section of a SQL query to find this average and store it in the AvgLineTotal variable:
AVG(contributions_2020) AS AvgLineTotal
Now that you provided her with the average donation amount, Tayen decides to invite 50 people to the grand opening of a new community garden. You return to your New Donor List spreadsheet to determine how much each donor gave in the past two years. You will use that information to identify the 50 top donors and invite them to the event.
What is the correct syntax to add the contribution amounts in cells O2 and P2?
Tayen informs you that she’s thinking about inviting anyone who donated at least $100 in 2018, as well. However, she only has five open spaces. She asks you to report how many people gave at least $100 so she can determine if they can also be invited to the event.
The correct syntax to count how many donations of $100 or greater appear in Column O is =SUMIF(O2:O210,">=100").
The community garden grand opening was a success. In addition to the 55 donors Food Justice Rock Springs invited, 20 other prospects attended the event. Now, Tayen wants to know more about the donations that came in from new prospects compared to the original donors.
This SQL query can be used to identify the percentage of contributions from prospects compared to total donors:
Your team creates a highly effective prospects list for Food Justice Rock Springs. After a few months, many of these prospects become donors. Now, Tayen wants to know the top three cities in which these new donors live. She will use that information to determine if it’s still true that people who live closer to Rock Springs are more likely to donate.
To retrieve the number of donors in each city, sorted high to low, you use the following query:
Shuffle Q/A
For the past six months, you have been working for a direct-mail marketing firm as a junior marketing analyst. Direct mail is advertising material sent to people through the mail. These people can be current or prospective customers, clients, or donors. Many charities depend on direct mail for financial support.
Your company, Directly Dynamic, creates direct-mail pieces with its in-house staff of graphic designers, expert mail list services, and on-site printing. Your team has just been hired by a local nonprofit, Food Justice Rock Springs. The mission of Food Justice Rock Springs is to eliminate food deserts by establishing local gardens, providing mobile pantries, educating residents, and more. Click below to read the email from Tayen Bell, vice president of marketing and outreach.
You begin by reviewing the dataset. To use the template for this dataset, click the link below and select “Use Template.”
Link to template: Dynamic Dataset
Or, if you don’t have a Google account, download the file directly from the attachment below.
The client has asked you to send two separate mailings: one to people within 50 miles of Rock Springs; the other to anyone outside that area. So, to research each donor’s distance from the city, you first need to find out where all of these people live.
You could scroll through 209 rows of data, but you know there is a more efficient way to organize the cities.
Which of the following functions will enable you to sort your spreadsheet by city (Column K) in ascending order?
At this point, you notice that the information about state and zip code is in the same cell. However, your company’s mailing list software requires states to be on a separate line from zip codes.
What function do you use to move the 5-digit zip code in cell L2 into its own column?
Next, you duplicate your dataset twice using the Sheet Menu. You rename the first sheet Donation Form List, and you remove the cities that are further than 50 miles from Rock Springs. You rename the second sheet Postcard List, and you remove the cities that are within 50 miles of Rock Springs.
Then, you import these datasets into your company’s mailing list database. In a mailing list database, you create two tables: Donation_Form_List and Postcard_List. You decide to clean the Donation_Form_List first.
Your company’s mailing list software requires units to be on the same line as street addresses. However, they are currently in two separate columns (street_address and unit).
You use a SQL function to instruct the database to combine the two columns into a new column called “address.” The syntax is: JOIN(street_address, " to ", unit) as address.
You finish cleaning your datasets, so you decide to review Tayen’s email one more time to make sure you completed the task fully. It’s a good thing you checked because you forgot to identify people who have served on the board of directors or board of trustees. She wants to write them a thank-you note, so you need to locate them in the database.
To retrieve only those records that include people who have served on the board of trustees or on the board of directors, what is the correct query?
Your company’s direct-mail campaign was very successful, and Food Justice Rock Springs has continued partnering with Directly Dynamic. One thing you’ve been working on is assigning all donors identification numbers. This will enable you to clean and organize the lists more effectively.
Meanwhile, another team member has been creating a prospect list that contains data about people who have indicated interest in getting involved with Food Justice Rock Springs. These people are also assigned a unique ID. Now, you need to compare your donor list with the dataset in your database and collect certain data from both.
What SQL function will return all records from the left table and only the matching records from the right?
Your next task is to identify the average contribution given by donors over the past two years. Tayen will use this information to set a donation minimum for inviting donors to an upcoming event.
You start with 2019. To return average contributions in 2019 (contributions_2019), you use the AVG function. What portion of your SQL statement will instruct the database to find this average and store it in the AvgLineTotal variable?
Tayen informs you that she’s thinking about inviting anyone who donated at least $100 in 2018, as well. However, she only has five open spaces. She asks you to report how many people gave at least $100 so she can determine if they can also be invited to the event.
What is the correct syntax to count how many donations of $100 or great appear in Column O?
The community garden grand opening was a success. In addition to the 55 donors Food Justice Rock Springs invited, 20 other prospects attended the event. Now, Tayen wants to know more about the donations that came in from new prospects compared to the original donors.
Which SQL query can be used to calculate the percentage of contributions from prospects?
Your team creates a highly effective prospects list for Food Justice Rock Springs. After a few months, many of these prospects become donors. Now, Tayen wants to know the top three cities in which these new donors live. She will use that information to determine if it’s still true that people who live closer to Rock Springs are more likely to donate.
What clause do you add to the following query to sort the donors in each city from high to low?
For the past six months, you have been working for a direct-mail marketing firm as a junior marketing analyst. Direct mail is advertising material sent to people through the mail. These people can be current or prospective customers, clients, or donors. Many charities depend on direct mail for financial support.
Your company, Directly Dynamic, creates direct-mail pieces with its in-house staff of graphic designers, expert mail list services, and on-site printing. Your team has just been hired by a local nonprofit, Food Justice Rock Springs. The mission of Food Justice Rock Springs is to eliminate food deserts by establishing local gardens, providing mobile pantries, educating residents, and more. Click below to read the email from Tayen Bell, vice president of marketing and outreach.
You begin by reviewing the dataset. To use the template for this dataset, click the link below and select “Use Template.”
Link to template: Dynamic Dataset
Or, if you don’t have a Google account, download the file directly from the attachment below.
The client has asked you to send two separate mailings: one to people within 50 miles of Rock Springs; the other to anyone outside that area. So, to research each donor’s distance from the city, you first need to find out where all of these people live.
You could scroll through 209 rows of data, but you know there is a more efficient way to organize the cities.
Which of the following tools will enable you to sort your spreadsheet by city (Column K) in ascending order?
Now, you decide to address Tayen’s request to include a handwritten note in the direct-mail piece for anyone who gave at least $100 last year.
Which of the following procedures will enable you to change how cells in your spreadsheet appear if they contain a value of $100 or more?
Your database contains people who live in many areas of Wyoming. However, it’s important to align your in-house data with the data from Food Justice Rock Springs. You also need to separate your data into the two lists: Donation_Form_List and Postcard_List. They will be based on each city’s distance from Rock Springs.
The zip codes are in a column called zip_code. To select all data from the Donation_Form_List organized by zip code, you use the ORDER BY function. The syntax is:
You finish cleaning your datasets, so you decide to review Tayen’s email one more time to make sure you completed the task fully. It’s a good thing you checked because you forgot to identify people who have served on the board of directors or board of trustees. She wants to write them a thank-you note, so you need to locate them in the database.
To retrieve only those records that include people who have served on the board of trustees or on the board of directors, what clause do you include in your query?
Your team creates a highly effective prospects list for Food Justice Rock Springs. After a few months, many of these prospects become donors. Now, Tayen wants to know the top three cities in which these new donors live. She will use that information to determine if it’s still true that people who live closer to Rock Springs are more likely to donate.
Which SQL query will retrieve the number of donors in each city, sorted high to low?
At this point, you notice that the information about state and zip code is in the same cell. However, your company’s mailing list software requires states to be on a separate line from zip codes.
What function will enable you to move the 2-character state abbreviation in cell L2 into its own column?
A data analyst notices that two variables in their data seem to rise and fall at the same time. They recognize that these variables are related somehow. What is this an example of?
A data analyst adds labels to their line graph to make it easier to read, even though they already have a legend on their visualizations. How does labeling the data make it more accessible?
You are going to give a presentation to a broad audience. How can you make sure your visualizations are accessible to all members of the audience? Select two that apply.
Shuffle Q/A
What type of visualization is this?
A data analyst is using the Color tool in Tableau to apply a color scheme to a data visualization. They want the visualization to be accessible for people with color vision deficiencies, so they use a color scheme with lots of contrast. What does it mean to have contrast?
You are working with the World Happiness data in Tableau. What tool do you use to change your point of view of Greece?
Shuffle Q/A
You are preparing to communicate to an audience about an analysis project. You consider the roles that your audience members play and their stake in the project. What aspect of data storytelling does this scenario describe?
Shuffle Q/A
You are presenting your theory about the correlation between recent sales increases and a current pop culture trend. When is the best time to establish your presentation’s hypothesis for the audience?
Shuffle Q/A
Next, you decide on your data narrative’s characters, setting, plot, big reveal, and aha moment. The characters are the people affected by your story. This includes your stakeholders, Gaea’s customers, and Gaea’s potential future customers. For the setting, you describe the current situation, potential tasks, and background information about the analysis project.
As you begin to work on the plot for the data narrative, which of the following ideas would you include? Select all that apply.
After creating data visualizations about the current state of the electric vehicle market, you turn to projections. You want to communicate to stakeholders about the importance of longer vehicle battery range to consumers.
Your team analyzes data from a consumer survey that investigated the importance of longer battery range when choosing whether to purchase an electric car. The current average battery range is about 210 miles. By 2025, that distance is expected to grow to 450 miles per charge.
You create the following pie chart:
After reviewing your pie chart, you realize that it could be improved. How do you make this chart more effective?
As a final step in the data-sharing process, you think about how to respond during the Q&A session. What strategies will you employ when answering questions? Select all that apply.
You have been working as a junior data analyst at Bowling Green Business Intelligence for nearly a year. Your supervisor, Kate, tells you that she believes you are ready for more responsibility. She asks you to lead an upcoming client presentation. You will be responsible for creating the data story, identifying the right tools to use, building the slideshow, and delivering the presentation to stakeholders.
Your client is Gaea, an automotive manufacturer that makes eco-friendly electric cars. For the past year, you have been working with the data team in Gaea’s Bowling Green, Kentucky, headquarters. For the presentation, you will engage the data team, as well as its regional sales representatives and distributors. Your presentation will inform their business strategy for the next three-to-five years.
You begin by getting together with your team to discuss the data story you want to tell. You know the first step in data storytelling is to engage your audience.
Fill in the blank: A big part of engagement is knowing how to eliminate less important details. So, you use spotlighting to _____ the data in order to identify the most important insights.
After you identify the most important insights, it’s time to create your primary message. Your team’s analysis has revealed three key insights:
Electric vehicle sales demand is expected to grow by more than 400% by 2025.
The number of publicly available vehicle charging stations is a significant factor in consumer buying decisions. Currently, there are many locations with so few charging stations that electric car owners would run out of power when traveling between stations.
Vehicle battery range is also a significant factor for consumers. In 2020, the average battery range was 210 miles. However, the vast majority of survey respondents report they will not buy an electric car until the battery range is at least 300 miles per charge.
Based on these insights, you create your primary message. Which of the following reflect the expectations of a primary message?
Next, you decide on your data narrative’s characters, setting, plot, big reveal, and aha moment. During the narrative, you want to communicate to your stakeholders about the challenges associated with the current lack of vehicle charging stations and why it's important for Gaea to increase its cars’ battery range by 2025.
Information about charging stations and the need to increase battery range will be part of the setting of your data story.
Now, it’s time to consider which tools to use to create data visualizations that will clearly communicate the results of your analysis. You and your team decide to make both spreadsheet charts and Tableau data visualizations. In addition, you want to provide them with a tool that will achieve the following goals:
Organize multiple datasets about electric vehicle battery ranges into a central location
Enable tracking and analysis of electric vehicle data
Simplify data visualizations about the number of available charging stations using maps of the different geographies
What tool do you create for your stakeholders?
You want to add a label to represent the scale (total count by year) of electric vehicle sales. Where on the graph do you label these values?
You use Tableau to create the following draft of a visualization, which organizes the charging station data geographically:
After reviewing your draft, you realize that it could be improved.
Fill in the blank: To improve your draft, you select more varied hues and make the color intensity stronger. In addition, you choose darker _____ in order to reflect more light.
Now, you want to highlight what your team’s analysis discovered about the number of charging stations available compared to the number of cars purchased. Your data has confirmed that the lack of charging stations causes the effect of fewer car sales. To communicate this effectively, you will need to convey causation to the stakeholders.
You explain that causation is the measure of the degree to which two variables move in relationship to each other. In the case of Gaea’s business, charging station numbers and car sales move in the same direction.
Once you finish creating data visualizations about the current state of the electric vehicle market, you turn to projections for the future. You want to communicate to stakeholders about the importance of longer vehicle battery range to consumers.
Your team’s data includes feedback from a consumer survey that investigated the importance of longer battery when choosing whether to purchase an electric car. The current average battery range is about 210 miles. By 2025, that distance is expected to grow to 450 miles per charge.
You create the following pie chart:
Fill in the blank: After reviewing your pie chart, you realize that it could be improved. You resize the _____ so they visually show the different values.
It’s time to build your Tableau dashboard for stakeholders. You consider what type of layout to use.
You decide that you want to be able to adjust the width of the views and the data visualizations about electric vehicle sales, charging stations, and battery range. Which type of layout will enable you to do that?
You have created your narrative and visuals, so now it’s time to build a professional and appealing slideshow. You choose a theme that matches the tone of your presentation. Then, you create a title slide with a title, subtitle, and the date.
Next, you create the following slide that compares electric vehicle sales in 2015 and 2020:
After reviewing your slide, you realize that it could be improved. What steps do you take to make the two text boxes beneath the header more effective? Select all that apply.Add Your Heading Text Here
You then create the following slide to demonstrate the challenges associated with battery range and charging stations:
After reviewing your slide, you realize that the visual elements could be improved. A good solution would be for you to choose one data visualization to share on this slide, then create another slide for the second data visualization.
You complete your slideshow and share it with your team. Once it is approved by your supervisor, you begin preparing to give your presentation. You consider maintaining good posture, being aware of nervous habits, and making eye contact. In addition, you think about how you will explain the data visualizations.
One of the strategies you practice is the five-second rule. What are some key aspects of this rule? Select all that apply.
Next, you prepare for the question-and-answer session that will follow your presentation. To predict what questions they may ask, you do a colleague test of your presentation. You should choose a colleague who has deep expertise in the electric vehicle industry.
Now that you have some idea of the questions the stakeholders will ask, you and a team member consider different objections that might arise.
Your team member asks you how you will respond if someone from Gaea questions your data-cleaning process. How do you prepare for this objection? Select all that apply.
The big day has arrived, and you have just finished giving your presentation to the Gaea team. It’s now time for the question-and-answer session, and a stakeholder asks you a very detailed question about one specific electric vehicle charging station initiative.
You listen to the whole question, then repeat it. For what reasons is this important? Select all that apply.
Shuffle Q/A
You have been working as a junior data analyst at Bowling Green Business Intelligence for nearly a year. Your supervisor, Kate, tells you that she believes you are ready for more responsibility. She asks you to lead an upcoming client presentation. You will be responsible for creating the data story, identifying the right tools to use, building the slideshow, and delivering the presentation to stakeholders.
Your client is Gaea, an automotive manufacturer that makes eco-friendly electric cars. For the past year, you have been working with the data team in Gaea’s Bowling Green, Kentucky, headquarters. For the presentation, you will engage the data team, as well as its regional sales representatives and distributors. Your presentation will inform their business strategy for the next three-to-five years.
You begin by getting together with your team to discuss the data story you want to tell. You know the first step in data storytelling is to engage your audience.
You use spotlighting to help you identify the most important insights. Which of the following activities are involved with spotlighting? Select all that apply.
Once you have identified the most important insights, it’s time to create your primary message. Your team’s analysis has revealed three key insights:
Electric vehicle sales demand is expected to grow by more than 400% by 2025.
The number of publicly available vehicle charging stations is a significant factor in consumer buying decisions. Currently, there are many locations with so few charging stations that electric car owners would run out of power when traveling between stations.
Vehicle battery range is also a significant factor for consumers. In 2020, the average battery range was 210 miles. However, the vast majority of survey respondents report they will not buy an electric car until the battery range is at least 300 miles per charge.
Based on these insights, you create your primary message. What are the expectations of a primary message? Select all that apply.
Next, you decide on your data narrative’s characters, setting, plot, big reveal, and aha moment. During the narrative, you want to communicate to your stakeholders about the challenges associated with the current lack of vehicle charging stations and why it's important for Gaea to increase its cars’ battery range by 2025.
In which part of your data narrative would you include information about charging stations, the need to increase battery range, and why it’s important for Gaea to increase its cars’ battery range?
It’s time to build your Tableau dashboard for stakeholders. You consider what type of layout to use.
Describe the differences between vertical and horizontal layouts. Select all that apply.
You have created your narrative and visuals, so now it’s time to build a professional and appealing slideshow. You choose a theme that matches the tone of your presentation. Then, you create a title slide with a title, subtitle, and the date.
Next, you create the following slide to communicate information about electric vehicle sales in 2015 compared to 2020:
Alt-text: Slideshow with bar chart of electric vehicle sales from 2015 and 2022. 2022 had higher sales. There are also multiple sentences at the bottom of the slide and another piece of descriptive text near the chart.
To improve the slide, you remove the text box at the bottom. For what reasons will this make your slide more effective? Select all that apply.
You complete your slideshow and share it with your team. Once it is approved by your supervisor, you begin preparing to give your presentation. You consider maintaining good posture, being aware of nervous habits, and making eye contact. In addition, you think about how you will speak.
What strategies can help you speak effectively? Select all that apply.
Next, you prepare for the question-and-answer session that will follow your presentation. What methods help you consider any limitations of your data? Select all that apply.
The big day has arrived, and you finish your presentation to the Gaea team. In the question-and-answer session, a stakeholder asks you a very detailed question about a car battery range project that's still in development.
What strategies do you use in order to respond effectively? Select all that apply.
Your team’s analysis has revealed three key insights:
Electric vehicle sales demand is expected to grow by more than 400% by 2025.
The number of publicly available vehicle charging stations is a significant factor in consumer buying decisions. Currently, there are many locations with so few charging stations that electric car owners would run out of power when traveling between stations.
Vehicle battery range is also a significant factor for consumers. In 2020, the average battery range was 210 miles. However, the vast majority of survey respondents report they will not buy an electric car until the battery range is at least 300 miles per charge.
Fill in the blank: Based on these insights, you create a clear and direct _____, which will guide your data story.
Now, it’s time to consider which tools to use to create data visualizations that will clearly communicate the results of your analysis. You and your team decide to make both spreadsheet charts and Tableau data visualizations. In addition, you agree to build a dashboard to share live, incoming data with your stakeholders. This will help them achieve the following goals:
Organize multiple datasets about electric vehicle battery ranges into a central location
Enable tracking and analysis of electric vehicle data
Simplify data visualizations about the number of available charging stations using maps of the different geographies
Another key benefit of dashboards is that they enable you to maintain control of your data narrative.
You use Tableau to create the following draft of a visualization, which organizes the charging station data geographically:
After reviewing your draft, you realize that it could be improved. What steps do you take to make your map more effective? Select all that apply.
Now, you want to highlight what your team’s analysis discovered about the number of charging stations available compared to the number of cars purchased. Your data has confirmed that the lack of charging stations causes the effect of fewer car sales. To communicate this effectively, you will need to convey causation to the stakeholders.
How do you explain causation?
You then create the following slide to demonstrate the challenges associated with battery range and charging stations:
After reviewing your slide, you realize that the visual elements could be improved. You do this by first choosing one data visualization to share on this slide, then create another slide for the second data visualization.
Fill in the blank: In addition, you make sure to use _____ font sizes and colors for all of your data visualization titles.
Now that you have some idea of the questions the stakeholders will ask, you consider potential objections. You and a team member consider different objections that might arise. Your team member asks you how you will respond if someone from Gaea has an objection that you haven’t prepared for.
You say that you will respond professionally using the information you currently have available in order to move quickly past the objection.
You have been working as a junior data analyst at Bowling Green Business Intelligence for nearly a year. Your supervisor, Kate, tells you that she believes you are ready for more responsibility. She asks you to lead an upcoming client presentation. You will be responsible for creating the data story, identifying the right tools to use, building the slideshow, and delivering the presentation to stakeholders.
Your client is Gaea, an automotive manufacturer that makes eco-friendly electric cars. For the past year, you have been working with the data team in Gaea’s Bowling Green, Kentucky, headquarters. For the presentation, you will engage the data team, as well as its regional sales representatives and distributors. Your presentation will inform their business strategy for the next three-to-five years.
You begin by getting together with your team to discuss the data story you want to tell. You know the first step in data storytelling is to engage your audience.
A big part of audience engagement is knowing how to eliminate less important details. What practice do you use to scan quickly through the data in order to identify the most important insights?
Now that you have finished planning the data story with your team, it’s time to create data visualizations. First, you consider electric vehicle sales worldwide in 2015 compared to 2020. You use a spreadsheet to create the following bar graph to compare the two values:
You add information on the x-axis to represent a scale of values for the total electric vehicle sales and on the y-axis to represent the time periods (2015 and 2020).
You then create the following slide to demonstrate the challenges associated with battery range and charging stations:
After reviewing your slide, you realize that the visual elements could be improved. Which of the following options would help you make the visual elements on this slide more effective? Select all that apply.
You complete your slideshow and share it with your team. Once it is approved by your supervisor, you prepare to give your presentation. You consider presentation best practices: maintaining good posture, being aware of nervous habits, and making eye contact. In addition, you think about how you will present your data visualizations.
What strategies can help you explain the data visualizations effectively? Select all that apply.
What are the benefits of using a programming language for data analysis? Select all that apply.
What process does a data analyst use to instruct a computer to perform sets of actions?
A team of data analysts is working on a complex analysis. The team needs to quickly process lots of data. They also need to easily reproduce and share every step of their analysis. What should they use for the analysis?
What is a type of application that brings together all the tools a data analyst may want to use in a single place?
Which of the following statements about RStudio’s integrated development environment are correct? Select all that apply.
A data analyst writes the code summary(penguins) in order to display a summary of the penguins dataset. Where in RStudio can the analyst execute the code? Select all that apply.
1.A data analyst uses words and symbols to give instructions to a computer. What are the words and symbols known as?
5.Fill in the blank: A data analyst wants to quickly create visualizations and then share them with a teammate. They can use _____ for the analysis.
Shuffle Q/A
A data analyst inputs the following code in RStudio: print(100 / 10) What type operators does the analyst use in the code?
Which of the following is a best practice when naming variables in R?
1.A data analyst is assigning a variable to a value in their company’s sales dataset for 2020. Which variable name uses the correct syntax?
change_1 <- 70
Which of the following types of operators does the analyst use in the code?
mtcars %>%
filter(carb > 1) %>%
group_by(cyl) %>%
What is this code chunk an example of?
Shuffle Q/A
penguins %>%
filter(flipper_length_mm == 200) %>%
group_by(species) %>%
summarize(mean = mean(body_mass_g)) %>%
A data scientist is trying to print a data frame but when you print the data frame to the console output produces too many rows and columns to be readable. What could they use instead of a data frame to make printing more readable?
A data analyst is working with a large data frame. It contains so many columns that they don’t all fit on the screen at once. The analyst wants a quick list of all of the column names to get a better idea of what is in their data. What function should they use?
You are working with the penguins dataset. You want to use the summarize() and min() functions to find the minimum value for the variable flipper_length_mm. At this point, the following code has already been written into your script:
penguins %>%
drop_na() %>%
group_by(species, sex) %>%
Add the code chunk that lets you find the minimum value for the variable flipper_length_mm.
(Note: do not type the above code into the code block editor, as it has already been inputted. Simply add a single line of code based on the prompt.)
What species and sex have the lowest minimum flipper length in mm?
2.A data analyst is exploring their data to get more familiar with it. They want a preview of just the first six rows to get a better idea of how the data frame is laid out. What function should they use?
What are the names of the columns in the ToothGrowth dataset?
sales %>%
The data frame contains a column named q1_sales. What code chunk does the analyst add to change the name of the column from q1_sales to quarter1_sales ?
penguins %>%
The variable species includes three penguin species: Adelie, Chinstrap, and Gentoo. What code chunk does the analyst add to create a data frame that only includes the Gentoo species?
penguins %>%
drop_na() %>%
group_by(species) %>%
Add the code chunk that lets you find the maximum value for the variable flipper_length_mm. drop_na() %>%
group_by(species) %>%
Add the code chunk that lets you find the minimum value for the variable bill_depth_mm.
What is the minimum bill depth in mm for the Chinstrap species?
What is the maximum flipper length in mm for the Gentoo species?
Which function will return the average value of the y column?
Shuffle Q/A
What is the average value of the len column?
penguins %>% arrange(body_mass_g)
head(penguins)
What can the data analyst do to fix their code?
penguins %>%
drop_na() %>%
group_by(species) %>%
Add the code chunk that lets you find the mean value for the variable body_mass_g.
What is the mean body mass in g for the Adelie species?
penguins %>%
drop_na() %>%
group_by(species) %>%
Add the code chunk that lets you find the minimum value for the variable bill_depth_mm.
What is the minimum bill depth in mm for the Chinstrap species?
How many variables does the ToothGrowth dataset contain?
penguins %>%
filter(species <- “Adelie”)
How can the analyst change the second line of code to correct the error?
penguins %>%
drop_na() %>%
group_by(species) %>%
summarize(min = min(year), max = max(year))
When you run the code in the code box, how many different groups are returned by this code chunk?
How many different data types are used for the column data types?
bias(actual_temp, predicted_temp)
What will this code chunk calculate?
A data analyst creates a scatterplot with many data points. The analyst wants to make some points on the plot more transparent than others. What aesthetic should the analyst use?
You are working with the diamonds dataset. You create a bar chart with the following code:
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = color, fill = cut)) +
You want to use the facet_wrap() function to display subsets of your data. Add the code chunk that lets you facet your plot based on the variable cut.
How many subplots does your visualization show?
ggplot(data = diamonds)
What symbol should the analyst put at the end of the line of code to add a layer to the plot?
ggplot(data = penguins) + geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))
Which of the following represents a function in the code chunk? Select all that apply.
ggplot(data = penguins) +
geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))
You want to highlight the different penguin species on your plot. Add a code chunk to the second line of code to map the aesthetic shape to the variable species.
NOTE: the three dots (...) indicate where to add the code chunk.
Which penguin species does your visualization display?
ggplot(data = penguins) +
geom_jitter(mapping = aes(x = flipper_length_mm, y = body_mass_g))
What does the geom_jitter() function do to the points in the plot?
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = color, fill = cut)) +
You want to use the facet_wrap() function to display subsets of your data. Add the code chunk that lets you facet your plot based on the variable clarity.
How many subplots does your visualization show?
ggplot(data = penguins) +
geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g)) +
What code chunk do you add to the third line to save your plot as a png file with “penguins” as the file name?
Shuffle Q/A
ggplot(data = buildings) +
geom_bar(mapping = aes(x = construction_year, color = height))
Which of the following represents an aesthetic attribute in the code chunk?
ggplot(data = buildings) +
geom_bar(mapping = aes(x = construction_year, color = height))
Which of the following represents a variable in the code chunk?
ggplot(data = penguins) %>%
geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))
ggplot(data = penguins) +
geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))
You want to highlight the different penguin species in your plot. Add a code chunk to the second line of code to map the aesthetic size to the variable bill_depth_mm.
NOTE: the three dots (...) indicate where to add the code chunk. You may need to scroll in order to find the dots.
Which approximate range of bill depths does your visualization display?
ggplot(data = penguins)
+ geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))
ggplot(data = penguins) +
geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))
You want to highlight the different penguin species in your plot. Add a code chunk to the second line of code to map the aesthetic shape to the variable species.
NOTE: the three dots (...) indicate where to add the code chunk. You may need to scroll in order to find the dots.
Which species tends to have the longest flipper length and highest body mass?
ggplot(data = penguins) +
geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))
You want to highlight the different years of data collection on your plot. Add a code chunk to the second line of code to map the aesthetic size to the variable year.
NOTE: the three dots (...) indicate where to add the code chunk. You may need to scroll in order to find the dots.
What years does your visualization display?
ggplot(data = buildings) +
geom_bar(mapping = aes(x = construction_year, color = height))
Which of the following represents a function in the code chunk?
ggplot(data = penguins) +
geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g)
ggplot(data = penguins) +
geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))
You want to highlight the different years of data collection on your plot. Add a code chunk to the second line of code to map the aesthetic alpha to the variable island.
NOTE: the three dots (...) indicate where to add the code chunk. You may need to scroll in order to find the dots.
What islands does your visualization display?
ggplot(data = penguins) +
geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g)) +
What code chunk do you add to the third line to save your plot as a pdf file with “penguins” as the file name?
A data analyst wants to create a shareable report of their analysis with documentation of their process and notes explaining their code to stakeholders. What tool can they use to generate this?
A data analyst wants to add a bulleted list to their R Markdown document. What symbol can they type to create this formatting?
Shuffle Q/A
After previewing and cleaning your data, you determine what variables are most relevant to your analysis. Your main focus is on Rating, Cocoa.Percent, and Bean.Type. You decide to use the select() function to create a new data frame with only these three variables.
Assume the first part of your code is:
trimmed_flavors_df <- flavors_df %>%
Add the code chunk that lets you select the three variables.
What bean type appears in row 6 of your tibble?
As part of the data science team at Gourmet Analytics, you use data analytics to advise companies in the food industry. You clean, organize, and visualize data to arrive at insights that will benefit your clients. As a member of a collaborative team, sharing your analysis with others is an important part of your job.
Your current client is Chocolate and Tea, an up-and-coming chain of cafes.
The eatery combines an extensive menu of fine teas with chocolate bars from around the world. Their diverse selection includes everything from plantain milk chocolate, to tangerine white chocolate, to dark chocolate with pistachio and fig. The encyclopedic list of chocolate bars is the basis of Chocolate and Tea’s brand appeal. Chocolate bar sales are the main driver of revenue.
Chocolate and Tea aims to serve chocolate bars that are highly rated by professional critics. They also continually adjust the menu to make sure it reflects the global diversity of chocolate production. The management team regularly updates the chocolate bar list in order to align with the latest ratings and to ensure that the list contains bars from a variety of countries.
They’ve asked you to collect and analyze data on the latest chocolate ratings. In particular, they’d like to know which countries produce the highest-rated bars of super dark chocolate (a high percentage of cocoa). This data will help them create their next chocolate bar menu.
Your team has received a dataset that features the latest ratings for thousands of chocolates from around the world. Click here to access the dataset. Given the data and the nature of the work you will do for your client, your team agrees to use R for this project.
You create a short document about the benefits of using R for the project and share the document with your team. You write that the benefits include R’s ability to quickly process lots of data and easily reproduce and share an analysis. What is another benefit of using R for the project?
Before you begin working with your data, you need to import it and save it as a data frame. To get started, you open your RStudio workspace and load all the necessary libraries and packages. You upload a .csv file containing the data to RStudio and store it in a project folder named flavors_of_cacao.csv.
You use the read_csv() function to import the data from the .csv file. Assume that the name of the data frame is flavors_df and the .csv file is in the working directory. What code chunk lets you create the data frame?
Now that you’ve created a data frame, you want to find out more about how the data is organized. The data frame has hundreds of rows and lots of columns.
Assume the name of your data frame is flavors_df. What code chunk lets you review the column names in the data frame?
Next, you begin to clean your data. When you check out the column headings in your data frame you notice that the first column is named Company...Maker.if.known. (Note: The period after known is part of the variable name.) For the sake of clarity and consistency, you decide to rename this column Maker (without a period at the end).
Assume the first part of your code chunk is:
flavors_df %>%
What code chunk do you add to change the column name?
Assume the first part of your code is:
trimmed_flavors_df <- flavors_df %>%
Add the code chunk that lets you select the three variables.
Assume the first part of your code is:
trimmed_flavors_df %>%
You want to use the summarize() and max() functions to find the maximum rating for your data. Add the code chunk that lets you find the maximum value for the variable Rating.
What is the maximum rating?
7.After completing your analysis of the rating system, you determine that any rating greater than or equal to 3.5 points can be considered a high rating. You also know that Chocolate and Tea considers a bar to be super dark chocolate if the bar's cocoa percent is greater than or equal to 70%. You decide to create a new data frame to find out which chocolate bars meet these two conditions.
Assume the first part of your code is:
best_trimmed_flavors_df <- trimmed_flavors_df %>%
You want to apply the filter() function to the variables Cocoa.Percent and Rating. Add the code chunk that lets you filter the data frame for chocolate bars that contain at least 70% cocoa and have a rating of at least 3.5 points.
Assume your first line of code is:
ggplot(data = best_trimmed_flavors_df) +
You want to use the geom_bar() function to create a bar chart. Add the code chunk that lets you create a bar chart with the variable Rating on the x-axis.
Assume that you are working with the following code:
ggplot(data = best_trimmed_flavors_df) +
geom_bar(mapping = aes(x = Company.Location))
Add a code chunk to the second line of code to map the aesthetic fill to the variable Rating.
NOTE: the three dots (...) indicate where to add the code chunk.
According to your bar chart, which two company locations produce the highest rated chocolate bars?
A teammate creates a new plot based on the chocolate bar data. The teammate asks you to make some revisions to their code.
Assume your teammate shares the following code chunk:
ggplot(data = best_trimmed_flavors_df) +
geom_bar(mapping = aes(x = Cocoa.Percent)) +
What code chunk do you add to the third line to create wrap around facets of the variable Cocoa.Percent?
Your team has created some basic visualizations to explore different aspects of the chocolate bar data. You’ve volunteered to add titles to the plots. You begin with a scatterplot.
Assume the first part of your code chunk is:
ggplot(data = trimmed_flavors_df) +
geom_point(mapping = aes(x = Cocoa.Percent, y = Rating)) +
What code chunk do you add to the third line to add the title Recommended Bars to your plot?
Next, you create a new scatterplot to explore the relationship between different variables. You want to save your plot so you can access it later on. You know that the ggsave() function defaults to saving the last plot that you displayed in RStudio, so you’re ready to write the code to save your scatterplot.
Assume your first two lines of code are:
ggplot(data = trimmed_flavors_df) +
geom_point(mapping = aes(x = Cocoa.Percent, y = Rating)) +
What code chunk do you add to the third line to save your plot as a jpeg file with chocolate as the file name?
As a final step in the analysis process, you create a report to document and share your work. Before you share your work with the management team at Chocolate and Tea, you are going to meet with your team and get feedback. Your team wants the documentation to include all your code and display all your visualizations.
You decide to create an R Markdown notebook to document your work. What are your reasons for choosing an R Markdown notebook? Select all that apply.
Shuffle Q/A
As part of the data science team at Gourmet Analytics, you use data analytics to advise companies in the food industry. You clean, organize, and visualize data to arrive at insights that will benefit your clients. As a member of a collaborative team, sharing your analysis with others is an important part of your job.
Your current client is Chocolate and Tea, an up-and-coming chain of cafes.
The eatery combines an extensive menu of fine teas with chocolate bars from around the world. Their diverse selection includes everything from plantain milk chocolate, to tangerine white chocolate, to dark chocolate with pistachio and fig. The encyclopedic list of chocolate bars is the basis of Chocolate and Tea’s brand appeal. Chocolate bar sales are the main driver of revenue.
Chocolate and Tea aims to serve chocolate bars that are highly rated by professional critics. They also continually adjust the menu to make sure it reflects the global diversity of chocolate production. The management team regularly updates the chocolate bar list in order to align with the latest ratings and to ensure that the list contains bars from a variety of countries.
They’ve asked you to collect and analyze data on the latest chocolate ratings. In particular, they’d like to know which countries produce the highest-rated bars of super dark chocolate (a high percentage of cocoa). This data will help them create their next chocolate bar menu.
Your team has received a dataset that features the latest ratings for thousands of chocolates from around the world. Click here to access the dataset. Given the data and the nature of the work you will do for your client, your team agrees to use R for this project.
Your supervisor asks you to write a short summary of the benefits of using R for the project. Which of the following benefits would you include in your summary? Select all that apply.
Before you begin working with your data, you need to import it and save it as a data frame. To get started, you open your RStudio workspace and load the tidyverse library. You upload a .csv file containing the data to RStudio and store it in a project folder named flavors_of_cacao.csv.
You use the read_csv() function to import the data from the .csv file. Assume that the name of the data frame is bars_df and the .csv file is in the working directory. What code chunk lets you create the data frame?
Now that you’ve created a data frame, you want to find out more about how the data is organized. The data frame has hundreds of rows and lots of columns.
Assume the name of your data frame is flavors_df. What code chunk lets you review the structure of the data frame?
Next, you begin to clean your data. When you check out the column headings in your data frame you notice that the first column is named Company...Maker.if.known. (Note: The period after known is part of the variable name.) For the sake of clarity and consistency, you decide to rename this column Brand (without a period at the end).
Assume the first part of your code chunk is:
flavors_df %>%
What code chunk do you add to change the column name?
Assume the first part of your code is:
trimmed_flavors_df <- flavors_df %>%
Add the code chunk that lets you select the three variables.
What company location appears in row 1 of your tibble?
Assume the first part of your code is:
best_trimmed_flavors_df <- trimmed_flavors_df %>%
You want to apply the filter() function to the variables Cocoa.Percent and Rating. Add the code chunk that lets you filter the new data frame for chocolate bars that contain at least 80% cocoa and have a rating of at least 3.75 points.
How many rows does your tibble include?
A teammate creates a new plot based on the chocolate bar data. The teammate asks you to make some revisions to their code.
Assume your teammate shares the following code chunk:
ggplot(data = best_trimmed_flavors_df) +
geom_bar(mapping = aes(x = Company)) +
What code chunk do you add to the third line to create wrap around facets of the variable Company?
Your team has created some basic visualizations to explore different aspects of the chocolate bar data. You’ve volunteered to add titles to the plots. You begin with a scatterplot.
Assume the first part of your code chunk is:
ggplot(data = trimmed_flavors_df) +
geom_point(mapping = aes(x = Cocoa.Percent, y = Rating)) +
What code chunk do you add to the third line to add the title Suggested Chocolate to your plot?
Next, you create a new scatterplot to explore the relationship between different variables. You want to save your plot so you can access it later on. You know that the ggsave() function defaults to saving the last plot that you displayed in RStudio, so you’re ready to write the code to save your scatterplot.
Assume your first two lines of code are:
ggplot(data = trimmed_flavors_df) +
geom_point(mapping = aes(x = Cocoa.Percent, y = Rating)) +
What code chunk do you add to the third line to save your plot as a pdf file with “chocolate” as the file name?
As a final step in the analysis process, you create a report to document and share your work. Before you share your work with the management team at Chocolate and Tea, you are going to meet with your team and get feedback. Your team wants the documentation to include all your code and display all your visualizations.
Fill in the blank: You want to record and share every step of your analysis, let teammates run your code, and display your visualizations. You decide to create _____ to document your work.
Before you begin working with your data, you need to import it and save it as a data frame. To get started, you open your RStudio workspace and load the tidyverse library. You upload a .csv file containing the data to RStudio and store it in a project folder named flavors_of_cacao.csv.
You use the read_csv() function to import the data from the .csv file. Assume that the name of the data frame is chocolate_df and the .csv file is in the working directory. What code chunk lets you create the data frame?
Next, you begin to clean your data. When you check out the column headings in your data frame you notice that the first column is named Company...Maker.if.known. (Note: The period after known is part of the variable name.) For the sake of clarity and consistency, you decide to rename this column Company (without a period at the end).
Assume the first part of your code chunk is:
flavors_df %>%
What code chunk do you add to change the column name?
As a final step in the analysis process, you create a report to document and share your work. Before you share your work with the management team at Chocolate and Tea, you are going to meet with your team and get feedback. Your team wants the documentation to include all your code and display all your visualizations.
You want to record and share every step of your analysis, let teammates run your code, and display your visualizations. What do you use to document your work?
Assume the first part of your code is:
trimmed_flavors_df %>%
You want to use the summarize() and mean() functions to find the mean rating for your data. Add the code chunk that lets you find the mean value for the variable Rating.
What is the mean rating?
Assume your first line of code is:
ggplot(data = best_trimmed_flavors_df) +
You want to use the geom_bar() function to create a bar chart. Add the code chunk that lets you create a bar chart with the variable Company on the x-axis.
How many bars does your bar chart display?
Shuffle Q/A
Did you complete a case study?
We hope you were excited about the opportunity to complete an optional case study in this course. It's a great way to showcase your new data analytics skills to potential employers.
Please let us know whether or not you completed a case study; you’ll be able to proceed with the course either way!
Related Google Career Certificates:
By vmartinez
Fill in the blank: Structured query language (SQL) enables data analysts to _____ information from a database. Select all that apply.
By vmartinez
Fill in the blank: Using a formula to perform calculations, creating a report from their data, and using spreadsheets to aggregate data would all be actions carried out in the ________ stage of the data lifecycle.
By vmartinez
A data analyst has finished an analysis project that involved private company data. They erase the digital files in order to keep the information secure. This describes which stage of the data life cycle?
By vmartinez
In data analysis, a function is a predefined operation whereas a formula is a set of instructions used to carry out a specific calculation.
By vmartinez
Fill in the blank: In the act phase of the data analysis process, a company may need to _____ the insights of the data analysis team.
By vmartinez
In the analyze phase of the data life cycle, what might a data analyst do? Select all that apply.