## Lies, Damn Lies and Medians

When I first got involved in property investing, I wrote a little program that scoured the real estate websites for details of properties that were in areas interesting to me. One of the stats that it tried to calculate was the average rental yield (the rent divided by the purchase price) for different areas. Unfortunately, the values I was getting back were utter rubbish.

After looking closer at the data, I realised that I’d fallen for a classic beginner’s mistake: I had tried to compare median values.

The median is an “average” measure of a set of values where there are just as many values smaller than it as there are larger than it. If there are five houses worth \$100k, \$120k, \$125k, \$190k and \$250k, then the median house value is \$125k (the middle one).

Medians are widely used by real estate agents because they are easy to calculate, aren’t skewed by the effect of a really expensive or really cheap property coming onto the market, and provide a simple message to buyers. They state how affordable an area is – if you can afford the median, then you can afford the majority of homes for sale in an area.

However, my mistake was comparing two median values in an area: 1) the median rentÂ  and 2) the median sales price. The set of properties available for rent was composed of completely different dwellings to the set of properties available for sale. For example, the median rent might have come from a 2 bedroom unit, while the median sale might have come from a 3 bedroom unit. As a result, the yields being calculated were much too low.

My mistake in comparing medians is repeated by many in the media every week in calculating property growth by comparing the median from one period with the median from another period. To be fair, it’s not entirely their fault, as they get their data from real estate agents.

Statisticians are aware of the problems with using the median for calculating growth rates and have come up with three improvements. Christopher Joye has written a detailed overview, but I’ll provide my potted summary.

### Stratified Median

The Australian Bureau of Statistics (ABS) and Australian Property Monitors (APM) both use an approach of grouping properties into related sets (stratifying the data) prior to computing medians. If the groupings are done properly, then any skewing in one group will not affect another group too much, so data from different periods should be more comparable. However, it doesn’t eliminate the problem that properties sold in different periods might not be comparable in the first place.

### Repeat Sales

The main approach used by Residex is based on calculating the growth rate of properties sold in a given period based on how much they sold for last time. Comparing a property with itself clearly doesn’t have the same level of issue as comparing medians. However, a property might have been renovated (or even completely demolished and rebuilt) since its last sale, which adds a wrinkle to the calculation. Also, sales of new buildings cannot be included since there isn’t a prior sale to compare them with.

### Hedonic Method

The hedonic method is less interesting than it sounds, but is the main approach used by RP Data. In this method, sale data is combined with data on the nature of each property, e.g. precise location, land size, number of bedrooms, number of bathrooms, etc. In this way, like can really be compared with like, and more accurate growth rates can be calculated for properties in different areas. However, this approach is only as good as its data, and we need to trust that the statisticians at RP Data have gotten the good stuff. Also, historical data for all of these additional details are hard to find, so it’s not possible to do comparisons as far back as with the other approaches.

In conclusion, it is clear that the three improved approaches all have their strengths and weaknesses, but all are superior to the plain median. I was never able to update my little property stats program to collect enough data to make proper comparisons, but at least I learned the pitfalls of comparing medians.