If two or more statistics are being created, however, the researcher needs to specify variable names. Stata has a default for this as long as there is only one statistic being created. Second, the new dataset needs variable names. Many times you will need to save the original data, collapse it, and then either drop the collapsed dataset from memory and call up the original data again or you might have to merge the collapsed data back to the original data. Thus, you need to be aware of what data are in Stata’s memory at all times.
First, as soon as a collapse command has been executed, the data fundamentally change. There are two somewhat tricky issues to understand when using the collapse command. The description of the collapse command that appears on screen describes not only the syntax for the collapse command, but it also provides a complete description of the group-level statistics that can be calculated within the collapse command. To learn more about the collapse command, use Stata''s on-line help: The collapse command in Stata performs this manipulation of the data for the researcher. To carry out such an analysis, the researcher needs to collapse the dataset with 373 school-level observations to a dataset with 72 county-level observations.
As each county levies taxes, one might want to conduct an empirical analysis of some behavior in which counties (not school disticts) are the unit of observation. However, there are 72 counties in Wisconsin. It would be rare to want to do this for the above examples (race, schooltype, and urbanicity) as the categories are very limited. In other situations, however, one does not want to calculate statistics but rather wants to transform the data so that the unit of observation actually changes.
The bysort command is very useful for deriving by-group summary statistics such as the ones determined above. degrees in the various MSA regions of Wisconsin? To do this, first save wisconsin98data.dta to the Stata folder of your U drive. Using wisconsin98data.dta, what are the minimum and maximum 1996, 1997, and 1998 salaries paid to teachers with B.A. It contains data on each of the 373 school public school districts in Wisconsin from the late 1990s. To more fully understand what is being discussed, consider wisconsin98data.dta, which is a cross-section dataset. Producing such simple descriptive statistics is easy in Stata with the bysort command: For example, one might want to know the average income by race or average days of school by school type or average commute times by urbanicity. Before executing the regression, however, the researcher may want to calculate statistics for subgroups of observations.
Regression analysis would likely produce coefficient estimates for a regression models by using all 3,417 observations with the classification variable being included in the model as a series of 0/1 dummy variables.
For example, race may equal 1 if asian, 2 if black, 3 if white, and 4 if other schooltype may equal ‘elementary’ for an elementary school, ‘middle’ for a middle school, and ‘high’ for a high school or urbanicity might equal 1 if a zip code is in an urban center and equal 0 if a zip code is not in an urban center. Suppose too that there is a classification variable in the dataset that groups individual observations in some way. For example, you might have a dataset with 3,417 observations where each observation is a different person or a different school or a different zip code. Suppose you have a cross-section dataset with a single line of data for each of n distinct observations. The lab also introduces some Stata date functions before giving you more practice with regression analysis and producing graphs. In this lab, several Stata commands are introduced that will allow you to execute some very useful data manipulations, including collapsing data, reshaping data, merging datasets, and appending datasets. Stata Lab 3: Managing Data Stata Lab 3: Managing Data