Data consistency issue: changing history
Posted: Fri Feb 14, 2014 3:26 pm
Hi
I have been using tickstory for a few months now and like it a lot, that is why I was quite surprised when discovering this.
Let me describe the issue using an example with a limited amount of data for easier checking/comparison: EURUSD for the late hours of 1-Jan-2012.
I have downloaded this data the first time around August 2013 but then re-downlaoded it 2 days ago (12-Feb-2013) on a separate instance of tickstory (separate PC that did not have any EURUSD data so far).
The 1-Jan-2012 tick-data has changed at some point between Aug-2013 and now which in my view goes against one of the main advantages of tickstory: consistent and reproducible data.
This effect can be confirmed by drilling down the data hierarchy of the tickstory database. For example, the file 21h_ticks.bi5 was empty and now it contains 12k of data.
It is not only the case that the hour 21 of EURUSD 1-Jan-2012 got added to the server only sometime after September 2013 but I also noticed that the values of the 22nd hour (still just an edge case example) changed by as much as 3-4 pips.
While I presume that the new data is of better quality I am equally concerned that old backtests become worthless since the revised data produces different results.
I understand that data correction are sometimes unavoidable, but would like to ask some specific questions and make a proposal:
a) can someone from tickstory confirm that some historic data has been changed in recent months?
b) how often does it happen, that data that is more than a month old gets changed on the server? If so, is there any way to make the users aware of this?
b) While I am aware that a new download of existing data does NOT overwrite the local TS database I would like to propose a new feature that allows a user to compare his local data with the latest data on the server periodically - ideally even by visualising differences. If there are differences, the user should have the choice if he wants to download those and overwrite his local DB or if he rather keeps the old values (already in his local db) for consistency reasons (even though the new ones on server might be more accurate).
I think that such a feature would make an already good tool even better.
Regards,
tickster
I have been using tickstory for a few months now and like it a lot, that is why I was quite surprised when discovering this.
Let me describe the issue using an example with a limited amount of data for easier checking/comparison: EURUSD for the late hours of 1-Jan-2012.
I have downloaded this data the first time around August 2013 but then re-downlaoded it 2 days ago (12-Feb-2013) on a separate instance of tickstory (separate PC that did not have any EURUSD data so far).
The 1-Jan-2012 tick-data has changed at some point between Aug-2013 and now which in my view goes against one of the main advantages of tickstory: consistent and reproducible data.
This effect can be confirmed by drilling down the data hierarchy of the tickstory database. For example, the file 21h_ticks.bi5 was empty and now it contains 12k of data.
It is not only the case that the hour 21 of EURUSD 1-Jan-2012 got added to the server only sometime after September 2013 but I also noticed that the values of the 22nd hour (still just an edge case example) changed by as much as 3-4 pips.
While I presume that the new data is of better quality I am equally concerned that old backtests become worthless since the revised data produces different results.
I understand that data correction are sometimes unavoidable, but would like to ask some specific questions and make a proposal:
a) can someone from tickstory confirm that some historic data has been changed in recent months?
b) how often does it happen, that data that is more than a month old gets changed on the server? If so, is there any way to make the users aware of this?
b) While I am aware that a new download of existing data does NOT overwrite the local TS database I would like to propose a new feature that allows a user to compare his local data with the latest data on the server periodically - ideally even by visualising differences. If there are differences, the user should have the choice if he wants to download those and overwrite his local DB or if he rather keeps the old values (already in his local db) for consistency reasons (even though the new ones on server might be more accurate).
I think that such a feature would make an already good tool even better.
Regards,
tickster