ISI - International Statistical Institute

Free Statistical Tools on the WEB

 

A short version of this article first appeared in the International Statistical Association newsletter, Vol 26, Number 1 (76),  2002, and is at   http://isi.cbs.nl/NLet/NLet021-04.htm   and    http://isi.cbs.nl/FreeTools.htm 

There is a great deal of information about statistics available for free on the web.  Information includes data, general statistical textbooks, email lists, software, and many sites about special topics, such as epidemiology, forecasting, data presentation, data editing, multiple imputation, and propensity score analysis.  This article is a brief review of some useful sites covering these topics.

One place to start is to look at sites that are general links to other statistical sites. General sites are Betty Jung's statsites  http://www.bettycjung.net/Statsites.htm, statsci   http://www.statsci.org/index.html,  the World Wide Web Virtual Library: Statistics  http://www.stat.ufl.edu/vlib/statistics.html, and Dr. Hossein Arsham's list  http://home.ubalt.edu/ntsbarsh/Business-stat/R.htm.   These sites can be used to find other sites.

At present, it is interesting to note that a number of statistical organizations are making concerted efforts to promote statistics, by increasing public awareness of how statistics impacts on everyday life. A current example is the International Year of Statistics  http://statistics2013.org/   This is to celebrate the contributions from statistical sciences. Similar projects are: the American Statistical Association's (ASA), Statistical Significance series, http://www.amstat.org/policy/statsig.cfm, which is a set of pamphlets each dedicated to showing how statistics informs some particular area, such as energy, health care and the environment; and the International Statistical Literacy Project  http://www.stat.auckland.ac.nz/~iase/islp/home  with a mission of "support, create and participate in statistical literacy activities and promotion around the world."  A kind of related project is stats.org   http://www.stats.org/   from George Mason University. This project describes basic statistical terms but the main focus seems to be discussing news stories and how to understand the statistics in those news stories.

Two government websites also try to help the public understand statistics.  The Australian Bureau of Statistics  http://www.abs.gov.au/websitedbs/a3121120.nsf/home/Understanding%20statistics  has an on line class and a page defining statistical terms.  US's National Atlas has a page on Understanding Descriptive Statistics   http://www.nationalatlas.gov/articles/mapping/a_statistics.html  

For students or those who want to learn about statistics, the best place to start is at various on-line statistics books. One is HyperStatistics Online, at http://davidmlane.com/hyperstat/.  This is a a nice statistics book, and it is a comprehensive list of other on line statistics books.  Most of these are basic to intermediate.  One book, the Statsoft text,  http://www.statsoft.com/textbook/   has the basics as well as fairly advanced topics.  Another approach is a site is Robert Niles' site Statistics Every Writer Should Know   http://www.robertniles.com/stats/   with plain English explanations for many basic statistical concepts.  Another list of online statistics books is here  http://gsociology.icaap.org/methods/stat.htm 

People can also take free on line training classes on statistics, for example, from the North Carolina Center for Public Health Preparedness Training Web Site, http://cphp.sph.unc.edu/training/index.php, or University of Minnesota's Midwest Center for Life-Long Learning in Public Health   http://www.sph.umn.edu/ce/mclph/   These classes offers a certificate at the end of the training.  StatTrek   http://stattrek.com/   also has a couple of on line tutorials.  Another project, from Claremont Graduate University is the Web Interface for Statistical Education   http://wise.cgu.edu/   also with some on line tutorials and links to resources. An open course from Carnegie Mellon http://oli.cmu.edu/learn-with-oli/see-our-free-open-courses/  is basically presenting material used in the course taught at the University.

On the other side of learning statistics, a couple of sites are about teaching statistics. The American Statistical Association has an on line journal, the Journal of Statistical Education, at http://www.amstat.org/publications/jse/  which has free articles about teaching statistics.  The  Consortium for the Advancement of Undergraduate Statistics Education at  http://www.causeweb.org/   is also about teaching statistics and has a great many links to texts, notes, journals, data sets, etc, in particular in the resources section.

Other statistical sites of interest on the the web statistical associations.  An international association is the International Statistical Institute  http://isi-web.org/   . Some other associations are the American Statistical Association   http://www.amstat.org/   the International Chinese Statistical Association   http://www.icsa.org/   and the International Indian Statistical Association  http://www.intindstat.org/ . Statsci has a list of associations  http://www.statsci.org/soc.html  as does the International Statistical Institute http://www.isi-web.org/statistical-societies .

There is also tons of free software on the net.  The best place to find free statistical software is the Free Statistical Software site at http://statpages.org/javasta2.html.  This site lists general purpose software, as well as software devoted to specific purposes, such as curve fitting, epidemiology, surveys, and programming.  There are also brief descriptions of each package. We also list software packages on our page   http://gsociology.icaap.org/methods/soft.html  along with a list of other sites that list free statistical packages.  One great site about learning how to use statistical software is the Statistical Computing site, at http://www.ats.ucla.edu/stat/default.htm.  They have a large number of links, how to's and other material, mostly for commercial packages.  One review of free statistical software is here   http://en.citizendium.org/wiki/Free_statistical_software   which  briefly describes the history, quality, functions and limits of a number of free packages.

There are a number of email lists.  Allstat, at   https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=allstat   is a general list, although a great deal of the postings appear to be postings about jobs or training courses.    Another list stat-l, at   http://lists.mcgill.ca/archives/stat-l.html  focuses more on statistical questions. Another useful list, not on Allstat, is Epidemio, at  http://www.listes.umontreal.ca/wws/info/epidemio-l   This list is about epidemiology. Another form of discussion group is the forum. TalkStats   http://www.talkstats.com/   is one forum, with discussions about basic to advanced, homework to theory.  A smaller forum is from Statistics.com   http://www2.statistics.com/resources/discussionboards/   with only two general categories, statistical methods and homework.

There are a number of comprehensive places to look for data.  One starting point for social, political and economic data is the Global Social Change Research Project  http://gsociology.icaap.org/,  which has both links to a very large number of other data link sites, and a page of data sets compiled or created from other data sets. Many of the data sets listed on this project site are public domain.  All of the data are free to use.  This UN site   http://data.un.org/   has data on nearly every topic, from the UN and it's various associates. Recently, the UN also posted a note saying that all of their data are free to use, copy, duplicate, etc, provide the UN is cited http://data.un.org/Host.aspx?Content=UNdataUse   The Worldbank also has a data page   http://data.worldbank.org/  The WorldBank also says most of their data can be used freely http://go.worldbank.org/OJC02YMLA0.  This UN site  http://unstats.un.org/unsd/methods/inter-natlinks/sd_natstat.asp  and this BLS site  http://www.bls.gov/bls/other.htm  link to national statistical centers of most countries of the world.

There are many statistical journals on the web with free content. Many of these are listed at the Directory of Open Access Journals (DOAJ)  http://www.doaj.org/doaj?func=subject&cpid=59   page on statistics.  Some of the journals listed here include the Latin American Journal of Probability and Mathematical Statistics  http://alea.impa.br/english/index_v7.htm ,  the Electronic Journal of Applied Statistical Analysis  http://siba-ese.unisalento.it/index.php/ejasa/index ,   and the Journal of Official Statistics  http://www.jos.nu/   Another journal with free online content, not listed in DOAJ, is InterStat  http://interstat.statjournals.net/  

 

There are resources about dozens of specific topics on the web.  Some of these topics include epidemiology, graphical analysis and presentation, missing data, forecasting, gathering data and meta-analysis.

Bayesian inference is a topic that should interest all statisticians. Basically, the traditional approach (called the "frequentist") is to use previously accumulated data to design a study and propose a hypothesis. Then a test is used to draw a conclusion. In contrast, Bayesian inference is a formal process to learn from evidence as it accumulates. "The Bayesian approach uses Bayes’ Theorem to formally combine prior information with current information on a quantity of interest. The Bayesian idea is to consider the prior information and the test results as part of a continual data stream, in which inferences are being updated each time new data become available." (Guidance for the Use of Bayesian Statistics in Medical Device Clinical Trials   http://www.fda.gov/MedicalDevices/DeviceRegulationandGuidance/GuidanceDocuments/ucm071072.htm)  There is much more to Bayesian inference, like that it makes much more explicit the examination of underlying assumptions. More resources include The International Society for Bayesian Analysis (ISBA)   http://bayesian.org/,    Bayesian perspectives for epidemiological research: I. Foundations and basic methods   http://ije.oxfordjournals.org/content/35/3/765.full   by Sander Greenland, and Bayesian Statistics in Oncology   http://onlinelibrary.wiley.com/doi/10.1002/cncr.24628/full,  by Adamina, Tomlinson and Guller. There is also an interesting video, Bayesian statistics made (as) simple (as possible)   http://www.youtube.com/watch?v=bobeo5kFz1g   from Allen Downey, Professor of Computer Science at the Franklin W. Olin College of Engineering.

Epidemiology: The two best places to start for epidemiology are EpiMonitor,   http://www.epimonitor.net/index.htm, which has a very comprehensive list of links and the WWW Virtual Library: Epidemiology  http://www.epibiostat.ucsf.edu/epidem/epidem.html  another gateway.  Another very good place to start is epidemiolog, at http://www.epidemiolog.net/.  This site also has a fairly comprehensive listing of epidemiology sites, as well as an on-line textbook. First time visitors should start at  http://www.epidemiolog.net/evolving/Epiville  http://epiville.ccnmtl.columbia.edu/  is another on line learning site, with learning modules for people to learn epidemiology. Another free on-line textbook is Epidemiology for the Uninitiated, at  http://www.bmj.com/about-bmj/resources-readers/publications/epidemiology-uninitiated   
     A very good place to find world epidemiological data, reports, issues and information is from WHO   http://www.who.int/topics/epidemiology/en/   which includes for example the 10 leading causes of death, and the  Weekly Epidemiological Record.

Presenting Results: After analyzing data, it is very helpful to know how to best present the results.  Very good sites are: Improving Data Visualization  http://www.improving-visualisation.org/,  a presentation from the Washington Statistical Society Methodology Seminars,  Data Presentation: A Guide To Good Graphics   http://www.scs.gmu.edu/~wss/methods/zawitzg.htmlBTS’s Guide to Good Statistical Practice  section on presenting results, at   http://www.bts.gov/publications/guide_to_good_statistical_practice_in_the_transportation_field/index.html  and Stat Canada's Statistics book, chapter on presenting graphs  http://www.statcan.gc.ca/edu/power-pouvoir/ch9/5214821-eng.htm  .   For some interesting good and bad examples, see the Gallery of Data Visualization, at   http://www.datavis.ca/gallery/index.php   More recently, there are sites showing moving charts, like Gapminder   http://www.gapminder.org/   or mapping international data like Show   http://show.mappingworlds.com/world/  

Missing Data:  One site that gives an overview of missing data page is the FAQ page of this missing data project http://www.missingdata.org.uk/ Also see Dr. Howell's page  http://www.uvm.edu/~dhowell/StatPages/More_Stuff/Missing_Data/Missing.html  A recent journal issue devoted to missing data is the June 2011 issue of Journal of Official Statistics  http://www.jos.nu/Contents/issue.asp?vol=27&no=2   One method frequently used is multiple imputation which fills in missing data by using other variables to predict the missing values.  One software program for estimating missing data is AMELIA, at   http://gking.harvard.edu/software/   The methodology center at Penn State has a podcast about missing data analysis  http://methodology.psu.edu/multimedia/podcast  

Forecasting: One faculty member has a lecture about forecasting. Hossein Arsham's Time Series Analysis and Forecasting Techniques, at  http://home.ubalt.edu/ntsbarsh/Business-stat/stat-data/Forecast.htm   Also, another forecasting site is the Federal Forecasters Consortium, at  http://www.va.gov/HEALTHPOLICYPLANNING/Federal_Forecasters_Consortium_Page2.asp   Conference proceedings can be downloaded from this site.

Methods of gathering data:  There are a number of sites on gathering data.  Two places to start are Resources for Methods in Evaluation and Social Research, at   http://gsociology.icaap.org/methods/  and The World Wide Evaluation Information Gateway   http://www.policy-evaluation.org/    These site are link to other sites about methods, quantitative and qualitative.  Some sites are about specific tools in data gathering.  Tom O'Connor's lecture notes, at  http://www.drtomoconnor.com/3760/default.htm  covers various issues such as measurement, validity and reliability, and scales in indexes.

Meta-analysis:  Study design has a brief overview  http://www.gwumc.edu/library/tutorials/studydesign101/metaanalyses.html. A somewhat stricter standard is described by Cleophas and Zwinderman  http://circ.ahajournals.org/content/115/22/2870.full   This article by Kattan reviews the strengths and weaknesses of meta analysis  http://www.ccjm.org/content/75/6/431.full.    

Propensity score analysis  http://www.epa.gov/caddis/da_advanced_5.html  .  Propensity score analysis is a method of dealing with self selection bias or other selection bias.  The methodology center at Penn State has a podcast about propensity score analysis  http://methodology.psu.edu/multimedia/podcast   Other papers include this by Nicholas and Gulliford http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2553527/  and another by Austin  http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3144483/   Paul Rosenbaum, the father of propensity score analysis, has a couple of overviews, http://www-stat.wharton.upenn.edu/~rosenbap/index.html, one about propensity scores, and another about observational studies.

Data mining. Statsoft has an entry on data mining  http://www.statsoft.com/textbook/data-mining-techniques/  Professors Anand Rajaraman and Jeffrey D. Ullman have a book, Mining of Massive Datasets,  http://infolab.stanford.edu/~ullman/  which includes a chapter on data mining. This youtube video  http://www.youtube.com/watch?v=R-sGvh6tI04 from NJIT School of Management professor Stephan P Kudyba, introduces data mining

I don't necessarily endorse any of the sites listed here, and do not assume responsibility for content of the web sites listed in this article. This article is solely presented for educational purposes.

The stats page
 

last updated 12/16/12
last verified  12/16/12

Gene Shackman, Ph.D.
Free Resources for Program Evaluation and Social Research Methods
http://gsociology.icaap.org/methods

 

publisher


Back to Home Page