wiki:datavali

Version 1 (modified by dennisw, 13 years ago) ( diff )

--

Data validation

There is a fairly large amount of data involved in this project. It's not uncommon for data to be incomplete. Possible scenarios will be listed here, together with their possible solutions. Let's assume for every scenario you have measured a 'regular and complete' round.

Missing coordinate(s)

Let's say there is a missing coordinate like the following:

'''latitude, longitude'''
52.1000, 4.1000
52.2000, null
52.3000, 4.2000

Missing values like these could be easily guessed by taking the first and last known value, and using the average of them as a replacement for the missing one. The newly calculated value shouldn't be that far off from the real one (except if you made a strange/unexpected turn at that specific value).

A harder case like this might need some thinking:

'''latitude, longitude'''
52.1000, 4.1000
null, null (100 rows)
53.1000, 5.1000

100 rows of missing coordinates. First of all, the impact of these missing values depend on the speed you traveled/measured with. If your first coordinate was at the NE point, and the last was at the SW point of Leiden, there is quite a large gap. (A side note: you should increase your measurements per time ratio if this would happen) For large gaps like these, it's hard to calculate an expected route. Even if within these missing values a random value would be measured (like a 10 missing/1 valid ratio), it might be wise to just ignore these values since it's hard to get a correct route on long distances.

Say we still have these 100 missing rows, but your first coordinate is at the start of a street, and the last at the end of that same street. This is more likely to occur when you measure at fair intervals.

Note: See TracWiki for help on using the wiki.