Home > Articles > Finance & Investing

  • Print
  • + Share This

Some Notes About the Data

A relatively large amount of minute-by-minute stock and option data was used in the preparation of this book. This information, in its unprocessed form, was purchased from Tick Data of Great Falls, Virginia. Many specific criteria went into the decision to choose this particular data source.

First, and most significant, was accuracy. Because slight discrepancies can cause significant errors in implied volatility calculations, it is important that the data be both accurate and complete. Assembling complete and accurate datasets is not a trivial exercise, as options trade on several different exchanges, often at low liquidity levels. It is, therefore, necessary that the data vendor precisely align timestamps for the individual trades before creating a single sequence or time series. In addition, the large number of strike price and expiration date combinations adds a level of complexity that becomes apparent when a new series is introduced or a stock splits. This situation is further complicated by the enormous number of symbols used and reused by the options market. When creating data files of option prices, it is, therefore, crucial that old and new data or data from different equities not be mistakenly commingled despite the presence of overlapping symbols. In this regard, it is not unusual for a single stock in a given year to have more than 1,200 strike price/expiration date combinations. Multiplying by the number of stocks and years yields a very large number of permutations.

File format is another important criterion. Individual files should contain text delimited by commas, spaces, or some other readily identifiable marker so that the information can be imported into a database or spreadsheet. Filenames should follow a consistent set of conventions that make it simple to identify a particular series. For example, trade data for the Apple Computer $170 strike price call expiring on 2010/01/16 and having the symbol WAA_AN might be stored in a file designated WAA_C_20100116_170.00_AN. This file can easily be found using Excel’s import feature by searching for the concatenated expiration strike (20100116_170). The search will yield just two files, one containing call data, and the other containing put data (the put file would be designated WAA_P_20100116_170.00_MN). In this way, simple file-retrieval functions found in Microsoft Office products can be used to retrieve an individual option series from tens of thousands, and the collection of files effectively becomes a database.

Tick Data files were named as described above, and the information was provided as simple comma-delimited text. The data was clean in the sense that series designations were consistent and anomalies that made no sense were removed. In this context, the term anomalies refers to trades that were made in error—an option purchased for $125 rather than $1.25. Furthermore, time series used in the book were spot checked by calculating implied volatilities across multiple strike prices contained in different files. No inconsistencies were found in any of the Tick Data information. Readers who decide to purchase their own data are encouraged to apply the same level of scrutiny before selecting a vendor.

  • + Share This
  • 🔖 Save To Your Account