ISI - International Statistical Institute

Free Statistical Tools on the WEB

 

A short version of this article first appeared in the International Statistical Association newsletter, Vol 26, Number 1 (76),  2002, and is at   http://isi.cbs.nl/NLet/NLet021-04.htm   and    http://isi.cbs.nl/FreeTools.htm 

There is a great deal of information about statistics available for free on the WEB.  Information includes data or data sets, general statistical textbooks, email lists, software, and many sites about special topics, such as epidemiology, forecasting, data presentation, data editing, multiple imputation, and propensity score analysis.  This article is a brief review of some useful sites covering these topics.

To start with, World Statistics Day  http://unstats.un.org/unsd/wsd/Default.aspx  was recently celebrated, on October 10, 2010. According to the UN, the goal of this day was to "pay tribute to statisticians’ outstanding work in producing and disseminating the necessary data to respond to the every day new challenges and to measure progress in people’s lives." (World Statistics press release, http://unstats.un.org/unsd/wsd/docs/WSD_18Oct2010.pdf .)  This was billed as the first World Statistics Day, so perhaps there will be more.

When looking for statistical information, there are several sites that are general links. One is the Intute statistics pagehttp://www.intute.ac.uk/statistics/  which has sub-pages on demography, international and national indicators, and statistical theory. The Intute site has a variety categories, such as data, educational material, government sites, mailing lists and societies. Two other general sites are Betty Jung's statsites  http://www.bettycjung.net/Statsites.htm  and statsci   http://www.statsci.org/index.html  

The best place to start for learning about statistics is HyperStatistics Online, at http://davidmlane.com/hyperstat/.  This is the best place it is a a nice statistics book, and it is a comprehensive list of other on line statistics books.  Most of these are basic to intermediate.  One book, the Statsoft text,  http://www.statsoft.com/textbook/   has the basics as well as fairly advanced topics.  Another, Statistics at square one  http://resources.bmj.com/bmj/readers/statistics-at-square-one/statistics-at-square-one   is a fairly introductory book, but from 1997.  Another approach is a site is Robert Niles' site Statistics Every Writer Should Know   http://www.robertniles.com/stats/   with plain English explanations for many basic statistical concepts.

People can also take free on line training classes on statistics, for example, from the North Carolina Center for Public Health Preparedness Training Web Site, http://nccphp.sph.unc.edu/training/index.php, or University of Minnesota's Midwest Center for Life-Long Learning in Public Health   http://www.sph.umn.edu/ce/mclph/   These classes offers a certificate at the end of the training.  StatTrek   http://stattrek.com/   also has a couple of on line tutorials.  Another project, from Claremont Graduate University is the Web Interface for Statistical Education   http://wise.cgu.edu/   also with some on line tutorials and links to resources. An open course from Carnegie Mellon http://oli.web.cmu.edu/openlearning/forstudents/freecourses.  is basically presenting material used in the course taught at the Univerity.

Also, the American Statistical Association has an on line journal, the Journal of Statistical Education, at http://www.amstat.org/publications/jse/  which has free articles about teaching statistics.  The  Consortium for the Advancement of Undergraduate Statistics Education at  http://www.causeweb.org/   is also about teaching statistics and has a great many links to texts, notes, journals, data sets, etc, in particular in the resources section.

Since statistics is difficult to learn and it is not always clear to the general public how statistics may be useful, there is one project aimed at educating the public: the International Statistical Literacy Project  http://www.stat.auckland.ac.nz/~iase/islp/home   The mission of this project "is to support, create and participate in statistical literacy activities and promotion around the world."  A similar project is Statistical Literacy   http://www.statlit.org/   which basically is a central resources for events, and links to presentations and other information. A kind of related project is stats.org   http://www.stats.org/   from George Mason University. This project describes basic statistical terms but the main focus seems to be discussing news stories and how to understand the statistics in those news stories.

Two government websites also try to help the public understand statistics.  The Australian Bureau of Statistics  http://www.abs.gov.au/websitedbs/a3121120.nsf/home/Understanding%20statistics  has an on line class and a page defining statistical terms.  US's National Atlas has a page on Understanding Descriptive Statistics   http://www.nationalatlas.gov/articles/mapping/a_statistics.html  

There are a number of statistical associations.  An international association is the International Statistical Institute  http://isi-web.org/   . Some other associations are the American Statistical Assocation   http://www.amstat.org/   the International Chinese Statistical Association   http://www.icsa.org/   and the International Indian Statistical Association  http://www.intindstat.org/ . Statsci has a list of associations  http://www.statsci.org/soc.html   as does the International Statistical Institute  http://isi-web.org/statsoc/nsslist .

There is also tons of free software on the net.  The best place to find free statistical software is the Free Statistical Software site at http://statpages.org/javasta2.html.  This site lists general purpose software, as well as software devoted to specific purposes, such as curve fitting, epidemiology, surveys, and programming.  There are also brief descriptions of each package. We also list software packages on our page   http://gsociology.icaap.org/methods/soft.html  along with a list of other sites that list free statistical packages.  One great site about learning how to use statistical software is the Statistical Computing site, at http://www.ats.ucla.edu/stat/default.htm.  They have a large number of links, how to's and other material, mostly for commercial packages.  One review of free statistical software is here   http://en.citizendium.org/wiki/Free_statistical_software   which  briefly describes the history, quality, functions and limits of a number of free packages.

There are a number of email lists.  Allstat, at   https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=allstat   is a general list, although a great deal of the postings appear to be postings about jobs or training courses.    Another list stat-l, at   http://lists.mcgill.ca/archives/stat-l.html  focuses more on statistical questions. Another useful list, not on Allstat, is Epidemio, at  http://www.listes.umontreal.ca/wws/info/epidemio-l   This list is about epidemiology. Another form of discussion group is the forum. TalkStats   http://www.talkstats.com/   is one forum, with discussions about basic to advanced, homework to theory.  A smaller forum is from Statistics.com   http://www.statistics.com/resources/discussionboards/   with only two general categories, statistical methods and homework.

There are a number of comprehensive places to look for data.  One starting point for social, political and economic data is the Global Social Change Research Project  http://gsociology.icaap.org/,  which has both links to a very large number of other data link sites, and a page of data sets compiled or created from other data sets. Many of the data sets listed on this project site are public domain.  All of the data are free to use.  This UN site   http://data.un.org/   has data on nearly every topic, from the UN and it's various associates.  The Worldbank also has a data page   http://data.worldbank.org/  Most of the data on the Worldbank site and all of the data on the UN site may be used freely.  This UN site  http://unstats.un.org/unsd/methods/inter-natlinks/sd_natstat.asp  and this BLS site  http://www.bls.gov/bls/other.htm  link to national statistical centers of most countries of the world.

There are a number of statistical journals on the web with free content. Many of these are listed at the Directory of Open Access Journals   http://www.doaj.org/doaj?func=subject&cpid=59   page on statistics.  Some of the journals listed here include the Latin American Journal of Probability and Mathematical Statistics  http://alea.impa.br/english/index_v7.htm ,  the Electronic Journal of Applied Statistical Analysis  http://siba-ese.unisalento.it/index.php/ejasa/index ,   and the Journal of Official Statistics  http://www.jos.nu/ 

There are resources about dozens of specific topics on the web.  Some of these topics include epidemiology, graphical analysis and presentation, missing data, forecasting, gathering data and meta-analysis.

Epidemiology: The two best places to start for epidemiology are EpiMonitor,   http://www.epimonitor.net/index.htm, which has a very comprehensive list of links and the WWW Virtual Library: Epidemiology  http://www.epibiostat.ucsf.edu/epidem/epidem.html  another gateway.  Another very good place to start is epidemiolog, at http://www.epidemiolog.net/.  This site also has a fairly comprehensive listing of epidemiology sites, as well as an on-line textbook. First time visitors should start at  http://www.epidemiolog.net/evolving/ .  Another free on-line textbook is Epidemiology for the Uninitiated, at   http://resources.bmj.com/bmj/readers/epidemiology-for-the-uninitiated/epidemiology-for-the-uninitiated-fourth-edition   (from 1997)
     A very good place to find world epidemiological data, reports, issues and information is from WHO   http://www.who.int/topics/epidemiology/en/   which includes for example the 10 leading causes of death, and the  Weekly Epidemiological Record.
    There are also two interesting sites for learning epidemiology. One is the Epidemiology Supercourse, http://www.pitt.edu/~super1/, which is a set of on line lectures on various epidemiology courses.  These lectures can be downloaded and used, whole or in part, in your own lectures.  The North Carolina Center for Public Health Preparedness Training Website  http://nccphp.sph.unc.edu/training/   has free on line training for biostatistics, epidemiology, other topics. You can get certificates for each class you complete. Each class is 1/2 to 1 hour. 

Presenting Results: After analyzing data, it is very helpful to know how to best present the results.  Very good sites are:  Informative Presentation of Tables, Graphs and Statistics, at  http://www.reading.ac.uk/ssc/publications/guides/toptgs.html   ,Washington Statistical Society Methodology Seminars,  Data Presentation: A Guide To Good Graphics   http://www.scs.gmu.edu/~wss/methods/zawitzg.html  and Presenting Data   http://lilt.ilstu.edu/gmklass/pos138/datadisplay/ .  Also BTS’s Guide to Good Statistical Practice  has a useful section on presenting results, at   http://www.bts.gov/publications/guide_to_good_statistical_practice_in_the_transportation_field/index.html   .   For some interesting good and bad examples, see the Gallery of Data Visualization, at  http://www.math.yorku.ca/SCS/Gallery/   More recently, there are sites showing moving charts, like Gapminder   http://www.gapminder.org/   or mapping international data like Show   http://show.mappingworlds.com/world/  

Missing Data:  Two sites that are overviews of missing data page are the University of Texas Statistical Services FAQ page, #25, at   http://www.utexas.edu/its-archive/rc/answers/general/gen25.html   and Professor von Hippel's faq page   http://www.sociology.ohio-state.edu/people/ptv/  where he talks about whether data are missing at random or not, and how to deal with the missing data. Also see the first couple of paragraphs of Dr. Howell's page  http://www.uvm.edu/~dhowell/StatPages/More_Stuff/Missing_Data/Missing.html  One way to deal with missing data is multiple imputation, described at the Multiple Imputation FAQ page, at http://www.stat.psu.edu/~jls/mifaq.html.  Multiple imputation fills in missing data by using other variables to predict the missing values.  This method is also described at Joseph Schafer’s site, in a 1999 article "Multiple imputation: a primer".  at   http://www.stat.psu.edu/~jls/index.html.   One software program for estimating missing data is AMELIA, at http://gking.harvard.edu/stats.shtml

Forecasting: Two faculty members have lectures about forecasting on the web.  These are Bob Nau's class notes on forecasting at http://www.duke.edu/~rnau/411out00.html, and Hossein Arsham's Time Series Analysis and Forecasting Techniques, at  http://home.ubalt.edu/ntsbarsh/Business-stat/stat-data/Forecast.htm   Also, another forecasting site is the Federal Forecasters Consortium, at   http://www1.va.gov/vhareorg/ffc.htm   Conference proceedings can be downloaded from this site.
 

Methods of gathering data:  There are a number of sites on gathering data.  Two places to start are Resources for Methods in Evaluation and Social Research, at   http://gsociology.icaap.org/methods/  and The World Wide Evaluation Information Gateway   http://www.policy-evaluation.org/    These site are link to other sites about methods, quantitative and qualitative.  Some sites are about specific tools in data gathering.  The Statnotes site has a section on survey methods, at   http://faculty.chass.ncsu.edu/garson/PA765/survey.htm    Tom O'Connor's lecture notes, at  http://www.drtomoconnor.com/3760/default.htm  covers various issues such as measurement, validity and reliability, and scales in indexes.

Meta-analysis:  There are several introductions to meta-analysis.  One is a supercourse  http://www.pitt.edu/~super1/lecture/lec1171/index.htm  .  One link is to an on line book Meta - Analysis: Methods of Accumulating Results Across Research Domains, by Larry C. Lyons, at   http://www.lyonsmorris.com/MetaA/index.htm   (this link sometimes doesn't work).   One of the Epi Supercourses is about meta-analysis, How to conduct a Meta-Analysis  http://www.pitt.edu/~super1/lecture/lec1171/index.htm   Another site is The Meta Analysis of Research Studies   http://echo.edres.org:8080/meta/   which is an overview and links to documents and resources.  

Other topics include propensity score analysis  http://www.epa.gov/caddis/da_advanced_5.html  .  Propensity score analysis is a method of dealing with self selection bias.  Also, the Federal Committee on Statistical Methodology, at http://www.fcsm.gov/reports/ , has some interesting papers, especially RL2. Record Linkage Techniques - 1997: Proceedings of an International Workshop and Exposition.  (This is RL2, not RL1.)   Another interesting special topic sit is the Centre for Multilevel Modelling at http://www.cmm.bristol.ac.uk/  One site about data mining is kdnuggets at http://www.kdnuggets.com/ (a newsletter and general links to links site).   

Gene Shackman*
Research Methods Website Manager
http://gsociology.icaap.org/methods

 

* Neither Dr. Shackman nor ISI endorse any of the sites listed here, and do not assume responsibility for content of the Websites listed in this article. This article is solely presented for educational and reference purposes  

Last updated and verified 11/10/2010.


 


Back to Home Page