Changes between Initial Version and Version 1 of datavali


Ignore:
Timestamp:
Jun 23, 2011, 11:56:01 AM (13 years ago)
Author:
dennisw
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • datavali

    v1 v1  
     1= Data validation =
     2There is a fairly large amount of data involved in this project. It's not uncommon for data to be incomplete. Possible scenarios will be listed here, together with their possible solutions. Let's assume for every scenario you have measured a 'regular and complete' round.
     3
     4== Missing coordinate(s) ==
     5Let's say there is a missing coordinate like the following:
     6{{{
     7'''latitude, longitude'''
     852.1000, 4.1000
     952.2000, null
     1052.3000, 4.2000
     11}}}
     12Missing values like these could be easily guessed by taking the first and last known value, and using the average of them as a replacement for the missing one. The newly calculated value shouldn't be that far off from the real one (except if you made a strange/unexpected turn at that specific value).
     13
     14A harder case like this might need some thinking:
     15{{{
     16'''latitude, longitude'''
     1752.1000, 4.1000
     18null, null (100 rows)
     1953.1000, 5.1000
     20}}}
     21100 rows of missing coordinates. First of all, the impact of these missing values depend on the speed you traveled/measured with. If your first coordinate was at the NE point, and the last was at the SW point of Leiden, there is quite a large gap. (A side note: you should increase your measurements per time ratio if this would happen)
     22For large gaps like these, it's hard to calculate an expected route. Even if within these missing values a random value would be measured (like a 10 missing/1 valid ratio), it might be wise to just ignore these values since it's hard to get a correct route on long distances.
     23
     24Say we still have these 100 missing rows, but your first coordinate is at the start of a street, and the last at the end of that same street. This is more likely to occur when you measure at fair intervals.