项目介绍
Your independent data analysis project will be done in two phases, each assessed by other students using peer assessment. You can also share your projects via the forums for additional feedback. The project counts as 15% of your grade for a certificate with distinction. 独立的数据分析项目需分两步完成。每一个阶段结束后都有学生互评。你可以在讨论区里分享你的项目,以便获得一些额外的反馈信息。在评“优秀证书”时,本项目会占到15%的分值。
Before the deadline, use the discussion forum to ask specific questions about the assignment. Please don’t post your proposal. 在截止日期前,允许在讨论区里讨论一些特定的问题,但请不要把你的方案直接发出来。
After the deadline, we encourage you to post your project submission to the forum to show other students interesting data sets and questions, and to get additional feedback on project (in addition to the three peer assessments you will receive formally that will be used to calculate your score for the project). 截止日期之后,我们鼓励大家把自己的项目方案发到讨论区中,同学间互享数据集及问题,还能获得额外的反馈,这是三份互评之外最好的补充。
Both phases of the project are open so that you can see what is expected. As we learn how you work, we may make small modifications to the project requirements as the course progresses, so please check again before the due dates. You can resubmit your work up until the due date. 项目的两个阶段都已开放, 你现在可以看到阶段要求。 随着课程进行,我们可能会对项目要求进行微调, 所以请在提交日期前再确认一下。在提交日期前,你可以重复提交你的方案。
Follow these steps when working on your project:
请完成以下项目步骤:
- 安装最新版本的 R 和 RStudio
- 点击本页底部的”Go To Assignment“的两个作业链接,检查下项目的评分标准。这些信息可以帮助你弄清项目各阶段的准确需求。
- 选择一个数据集(下方列表中选一 或 自备)
- 提交 阶段1:项目提案, 时间在 9月22号之前
- 在10月6号之前,完成至少3份项目提案的评分
- 查看自己项目提案的评分情况
- 开始 阶段2:数据分析项目 前,下载RMarkdown 模板,请使用如下代码: download.file(url = "//bit.ly/dasi_project_template", destfile = "dasi_project_template.Rmd") 如果上面的链接失败了,请使用这个URL:模板URL 更多RMarkdown语法请看 //rmarkdown.rstudio.com/.
- 提交 阶段2:数据分析项目,在10月20号之前。请注意,这最终产品只能是一个HTML文件,HTML里使用"cerulean"主题(上面模板会帮你弄好主题)。这个HTML文件必须是自完备的(包括了全部代码,文字描述,图形)。另外,请限制在 7页 以内。
- 在11月3号前,完成至少3份互评,
- 11月3号就可以收获你最后的成果反馈了。
If you haven't yet done so, download the latest version of R (//cran.r-project.org/) and the latest version of RStudio (//www.rstudio.com/products/rstudio/download/). Click the "Go To Assignment" links at the bottom for both the project proposal and the project to review the rubric to see how you will be graded. This information is going to be immensely useful in figuring out the exact expectations of both phases of the project. Choose a dataset (either from the ones listed below or another dataset of your choice) Submit Phase 1: Project Proposal before Monday September 22 (see below for details) Evaluate at least 3 proposals before Monday October 6 Review your evaluations Before starting Phase 2: Data Analysis Project download the RMarkdown template using the following code: download.file(url = "//bit.ly/dasi_project_template", destfile = "dasi_project_template.Rmd") If the above shortened link doesn't work for you, try the following URL instead: //d396qusza40orc.cloudfront.net/statistics/project/dasi_project_template.Rmd.
More information on the RMarkdown syntax: //rmarkdown.rstudio.com/. Watch a video on using RMarkdown to complete your project: //class.coursera.org/statistics-002/lecture/179 Submit Phase 2: Data Analysis Project before Monday, October 20 (see below for details). Note that the final product must be an HTML file that uses the "cerulean" theme (this is indicated in the template you download above). This HTML file is fully self contained (includes all code, write-up, and figures). Also note that there is a seven page limit to your write up. You can check whether or not you meet this page limit by using a print preview. Evaluate at least 3 projects before Monday, November 3 by opening the HTML file that they submitted in a browser of your choice (Chrome or Firefox recommended) Receive your project evaluations on Monday, November 3 Phase 1: Project Proposal
阶段1: 项目提案
Identify a research question similar to questions we’ve talked about in this course. Choose a dataset, and one or two variables from that dataset, with which to answer this question using a hypothesis test or confidence intervals (the dataset used is entirely up to you, it can be one of the datasets listed below under "Datasets for the Project", or another one of your own choosing). You should pick two variables of interest, and you will be exploring the relationship between them. These variables should be either numerical and categorical, or both categorical (but they cannot be both numerical, see below for examples). All analysis must be completed in R.
As part of the proposal stage you are also asked to complete a brief exploratory data analysis to determine if your data is appropriate for the project. You should submit a proposal with enough detail so that your peers can give you feedback before you start the full data analysis.
You can begin working on the proposal immediately, following the link at the bottom of this page. Please save your work as you go along. When you're ready to submit your work for evaluation, remember to click the "Submit" button. If the deadline passes and you haven't clicked "Submit" yet, then your saved work will not be evaluated. Note: You can re-submit your work for evaluation as many times as you want before the submission deadline September 22. Only your last submission will be seen and evaluated by your classmates.
After the submission deadline, you will have two weeks to provide feedback to others on their project proposals. Please assess at least 3 proposals before October 6th. This peer assessment will help you prepare for your project and provide you with experience with a variety of data sets and research questions.
Phase 2: Data Analysis Project
阶段2: 数据分析项目
Once you receive feedback on your proposal, you will then continue onto the data analysis project. This peer assessment will count towards your grade for a certificate with distinction. You will answer the research question you developed in the proposal phase using methods you’ve learned in this class, and summarize your findings into a report. More details on what should be included are in the assessment itself, linked below. All analysis must be completed in R.
This project is due on October 20th, and then you will have two weeks to provide feedback to other students on their projects. You will only receive full credit for your project if you evaluate 3 other projects before the November 3rd deadline.
Datasets for the project
项目数据集
We are providing two datasets that you can use for your project. Both of them come from large-scale US surveys, and they have been modified slightly to make them easier to use as part of this course. Even though there are only two datasets, each contains many variables and hopefully you should be able to find some combination of variables that are of interest to you. (1)General Social Survey (GSS): 一份社会调查,收集了
(1) General Social Survey (GSS): A sociological survey used to collect data on demographic characteristics and attitudes of residents of the United States. The codebook below lists all variables, the values they take, and the survey questions associated with them. There are a total of 57,061 cases and 114 variables in this dataset. Note that this is a cumulative data file for surveys conducted between 1972 - 2012 and that not all respondents answered all questions in all years. Codebook: Review the codebook to view a list of all variables, the values they take, and the original survey questions associated with the variables. Use the following code to load the GSS dataset into R: load(url("//bit.ly/dasi_gss_data")) For access from China try using the following URL in the code above: //d396qusza40orc.cloudfront.net/statistics/project/gss.Rdata
The name of the dataset that you load is gss. For example, you can see a list of the variable names using the following command: names(gss) Note that this dataset includes data from many years. In your analysis it might make sense to first subset the data for a particular year (or years) and analyze only data pertaining to those years. This might be especially useful if you're using a variable from a survey question that was only asked in certain years.
(2) American National Elections Study (ANES): (2) American National Elections Study (ANES): A survey of voters in the United States, conducted before and after every presidential election. The codebook below lists all variables, the values they take, and the survey questions associated with them.There are a total of 5,914 cases and 205 variables in this dataset. Note that not all respondents answered all questions. Codebook: Review the codebook to view a list of all variables, the values they take, and the original survey questions associated with the variables. Use the following code to load the ANES dataset into R: load(url("//bit.ly/dasi_anes_data")) For access from China try using the following URL in the code above: //d396qusza40orc.cloudfront.net/statistics/project/anes.RData The name of the dataset that you load is anes. For example, you can see a list of the variable names using the following command: names(anes) Examples of variables and research questions appropriate for the project Provided below are some examples of types of research questions and data that are appropriate for the project:
One numerical and one categorical: Is there a relationship between whether the mother worked during the first 5 years of the child's life and the highest level of education the child attains? [Data: Number of years of education of child; Mom’s working status - yes, no] Two categorical: Do racial minority groups in North Carolina have less access to health care coverage? [Data: Ethnicity - various levels; Health coverage - yes, no] Note that you can work with one numerical and one categorical variable, or two categorical variables (but not two numerical variables, as we'll learn methods associated with those later in the course).
While it may not be clear to you yet which techniques you would use to do statistical inference to answer these questions, you will have learned all the tools you need for the proposal by the end of Week 1 and all the tools you need for the actual project by the end of Week 6. Questions and answers about the project What type of variables should I use for my project? You need to pick two variables and evaluate the relationship between them. These can be a numerical and a categorical variable or two categorical variables. Do not use two numerical variables.
How will I find a dataset for my project? You can either find your own dataset on the web, collect your own data, or use one of the datasets we provide specifically to be used in the project (listed above). While finding or collecting your own dataset is an eye-opening experience that we think is immensely valuable, it can be very time consuming. There are a variety of data resources online, some of which we'll point you to throughout the course and some of which you might discover yourself. However, the internet is a bottomless well of information, and finding a dataset that interests you, that is appropriate for this project, and that can be analyzed using tools from this course can be challenging. Therefore, choosing to work with one of the datasets we provide to get a head start on the project quickly can also be a good idea. You are the best judge of how much time you can devote to the project, so it is your choice and responsibility to decide which route you want to take.
Can I use a dataset from the labs? No, you can to work with just about any dataset you like (as long as it meets the conditions required to apply the inferential methods we learn in this class), but you cannot re-use a dataset from the labs. One of the main objectives of this project is to work with a novel (to you) dataset. You can, however, choose to work with one of the datasets we provide specifically for the project (listed above).
Do I have to use R for my project? Yes. While there are other statistical packages and/or programming languages that may be perfectly appropriate for your project, since one of the goals of this course is to learn R, all analysis must be completed in R. Projects completed using other statistical packages and/or programming languages will receive a penalty.
Do I have to use RMarkdown for writing my project? Yes. While there are other options for word processors (Word, LibreOffice, etc.), the goal of this project is to create a fully reproducible data analysis document. RMarkdown is the perfect, easy-to-use environment for this. Projects completed in other formats will receive a penalty.
What format should my project be submitted in?
You should submit your project as an HTML file. This is the file that RStudio writes out when you hit "Knit HTML" in your markdown document. Simply upload the HTML file and your peer reviewer will be able to view your project in their browser.
Where can I find a list of R commands that might be useful for the project? See this document for a list of R commands you encountered in labs as well as a few others that you might find useful for the project. Note that this is not an exhaustive list.
Can I use DataCamp for my project? No, for the project you will need to use R/RStudio.
Who am I writing for? Write as if you are explaining your results to whomever would be interested in your research question, whether this is another scholar in your field or peers sharing your interest in the topic. This audience may not have taken a statistics course. You must be statistically accurate and use correct statistical terminology, but must also explain your conclusions in a way that anyone can understand.
Who will see my work? Other students who have submitted similar work will be given your work to evaluate. In addition, you will be able to share your work via the forum with the rest of the students to benefit everyone in class. You will benefit by receiving additional ideas about your project.
How will this project be graded? The projects will be evaluated via peer assessment. Three other students will carefully read your project proposal and will provide feedback on your project proposal. This part will not count towards your final grade, but it will prepare you for the data analysis project, show you other data sets and other ways of approaching questions, and give you practice with peer assessment. For the data analysis project, you will receive a score that is the median of your peer reviewers. You will receive the full score if you perform all 3 of your peer assessments; if you do not complete 3 peer assessments, your final score will be decremented by 20%. The project score counts as 15% towards your final grade for a certificate with distinction.
What do I have to do when?
我何时做何事?
本项目有两阶段,项目提案和数据分析。 There are two phases of the project: a project proposal phase and a data analysis project phase. Each phase contains a submission period and an evaluation period where you will provide feedback on others projects. Although you can access the data analysis project right away, you are more likely to be successful if you wait until you get feedback on your project proposal before submitting a completed data analysis project. See the Due Dates page.
互评项目涉嫌抄袭,怎么办?
- 遵照互评/反馈指导,完成项目互评。
- 向Coursera报告抄袭
我自己该如何避免抄袭?
抄袭是指未加引用地使用了他人的工作成果。关于来源引用,请访问OWL。在你个人项目中,需要对所有来源给出可信引用,甚至你就是原作者但发表在其他地方。
this guy is lazy, nothing leave