TDM 10200: Project 2 — 2024
Motivation: Pandas will enable us to work with data in Python. If you were enrolled in The Data Mine in the fall semester, you will recognize some similarities to data frames from R. If you are new to The Data Mine, you will likely find that Pandas makes it easy to work with data. Matplotlib is a widely-used Python library for creating visualizations in Python.
Context: This is our second project and we will continue to introduce some basic data types, basic operations using pandas and matplotlib
Scope: tuples, lists, pandas, matplotlib
Dataset(s)
You will use the following dataset(s) for questions
-
/anvil/projects/tdm/data/craigslist/vehicles.csv
Readings and Resources
-
Make sure to read about, and use the template found here, and the important information about projects submissions here.
-
Please review the following Examples Book pages before you start the project, and be sure to try some of these examples! These will help you be prepared for the project questions below.
Questions
Question 1 (2 pts)
-
Create a list called
mydata
that contains 6 tuples. Each tuple should have a student’s first name, age and major. (You may make up the students' information.) -
Use a DataFrame Constructor to convert
mydata
into a DataFrame namedstudentDF
. -
Use "iloc[]" to extract and display the second student’s information in the DataFrame
You may get more information about "iloc[]" here |
Question 2 (2 pts)
For question 2, when you run:
You need to use 3 cores in your Jupyter Lab session. If you started your Jupyter Lab session with only 1 core, just close your Jupyter Lab session and start a new session that uses 3 cores. Otherwise, your kernel will crash when you load the data. We added a video about starting an anvil session with more cores |
-
Read in the dataset
/anvil/projects/tdm/data/craigslist/vehicles.csv
into apandas
DataFrame calledmyDF
. (Optional: If you want to, you can use the first columnid
as the DataFrame’s index, but this is not required.) -
Display the first and last five rows of the
myDF
DataFrame.
|
Question 3 (2 pts)
-
Display how many rows and columns there are in the entire DataFrame
myDF
. -
Display a list of all the column names in the DataFrame
myDF
.
You can revisit the functions given in Project 1, Question 5, to help with both parts of this question. |
Question 4 (2 pts)
Use the data from myDF
to answer the following questions:
-
How many vehicles have a price that is strictly larger than $6000?
-
How many vehicles are from Indiana? How many are from Texas?
-
Display all of the regions listed in the data frame. You can use the
unique()
method on theregion
column ofmyDF
. How many different regions appear altogether (counting each region just once)?
We added a video about counting the number of entries per state (This is a different data set than the vehicles data, but it should help guide you about how to solve Question 4, because we are still counting items per state, just using breweries instead of vehicles, but the method is the same.) |
Question 5 (2 pts)
-
Plot a bar chart that illustrates the number of vehicles in each state, whose price is strictly lower than $6000. The bar chart should show the number of each of these vehicles in each state.
We added a two part video about making such a bar chart. See the part 1 video and the part 2 video. Note: The example videos are about the number of reviews per user (instead of the number of vehicles per state), but the method is the same, and these videos should help to guide your work on Question 5. |
Project 02 Assignment Checklist
-
Jupyter Lab notebook with your code, comments and output for the assignment
-
firstname-lastname-project02.ipynb
.
-
-
Python file with code and comments for the assignment
-
firstname-lastname-project02.py
-
-
Submit files through Gradescope
Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted. In addition, please review our submission guidelines before submitting your project. |