NEAREST NEIGHBOR BIAS IN THE SUBSTITUTION OF MISSING VALUES

Chris J. Cieszewski, Kim Iles

Abstract


We present a simplified illustration of the bias inherent in the general case of the Nearest Neighbor (NN) method used to substitute missing values. This presentation doesn't make any assumptions about the geometry of the sampled subjects. The general examples illustrate that the bias exists mainly at the limits of the data range and not necessarily within the center part of the range. However, the latter is also possible around any significant data gaps. Since the NN data domain stretches across an arbitrary subject characteristic rather than across the physical space, it is possible to reduce the bias by assuring that the domain range of the considered attribute is well-represented within its entire range, especially at its upper and lower limits.

Keywords


Mapping; Nearest Neighbor Imputation; Nearest Neighbor Bias; Large-area Forest Inventories; Multi-source Data Fusion.

Full Text:

PDF

References


Czaplewski, R. 2010. Review of: Nearest Neighbor Bias -- A simple example. Mathematical and Computational Forestry & Natural-Resource Sciences (MCFNS), 2(1), Pages: 66-66 (1). Retrieved from https://mcfns.net/index.php/Journal/article/view/MCFNS.2-66

Haara, A., & Kangas, A. 2012. Comparing K Nearest Neighbours Methods and Linear Regression – Is There Reason To Select One Over the Other?. Mathematical and Computational Forestry & Natural-Resource Sciences (MCFNS), 4(1), Pages: 50-65 (16). Retrieved from https://mcfns.net/index.php/Journal/article/view/MCFNS.4%3A50

Iles, K. 2010. Nearest Neighbor Bias -- A simple example. Mathematical and Computational Forestry & Natural-Resource Sciences (MCFNS), 2(1), Pages: 18-19 (2). Retrieved from https://mcfns.net/index.php/Journal/article/view/MCFNS.2-18

Iles, K. 2009. “Total-Balancing” an inventory: A method for unbiased inventories using highly biased non-sample data at variable scales. MCFNS. 1(1):10-13. http://mcfns.com/index.php/Journal/article/view/MCFNS-1:10/18

Lowe, R., & Cieszewski, C. 2014. Multi-source K-nearest neighbor, Mean Balanced forest inventory of Georgia. Mathematical and Computational Forestry & Natural-Resource Sciences (MCFNS), 6(2), 65-79 (15). Retrieved from https://mcfns.net/index.php/Journal/article/view/6_65/184


Refbacks

  • There are currently no refbacks.


   

© 2008 Mathematical and Computational Forestry & Natural-Resource Sciences