Wednesday, March 21, 2012

How to maintain the history of information in a table?

Hi!
I have a question regarding what is the best way to maintain history
information in a table. I'm quite new to this, so please excuse me if
this is a common or a silly question.
Assume there's a table for storing the tax information, called TaxInfo.
Its got a PK column, TaxID, and other tables have an FK to TaxID.
There's a "Percentage" column in this table, whose value might change
over a period of time. Of course, there is a Name column and a
Description column.
Now, assume there is some existing data in other tables which refer to
this Percentage value. If this value changes, all existing data in
other tables should continue to use the older Percentage value, whereas
any new and subsequent data in the other tables should use the newer
value.
I'm thinking two date columns, ValidFrom and ValidUpto in the TaxInfo
table. ValidFrom is the date on which the row is inserted. ValidUpto
will normally be NULL, but if there's a change in the Percentage value,
then the ValidUpto of that row becomes today's date, and there comes
another row with the new Percentage value and ValidUpto = NULL. So
whenever I want the active Percentage value, I look for the row with
ValidUpto = NULL. But, since the TaxID is a PK, there cannot be a
second row with the same TaxID, and I'm stuck. And there cannot be
another TaxID representing that tax, since it is the same tax but with
a different value. If I make TaxID a non-PK column, then how do I
create a relationship from another table to this table? I think this
problem should be so common that there should be some kind of a
readymade design pattern that solves this problem?
Generally, is this what is called historization? I've heard of this
term, but I did a Google search for historization and nothing came up!
What exactly is the technical term, if there's any, for this kind of a
feature? And if this is not historization, what else is called
historization?
- RameshRead this article
http://vyaskn.tripod.com/sql_archive_data.htm
"Ramesh" <dramesh@.rushmorent.com> wrote in message
news:1141046170.179430.102730@.i40g2000cwc.googlegroups.com...
> Hi!
> I have a question regarding what is the best way to maintain history
> information in a table. I'm quite new to this, so please excuse me if
> this is a common or a silly question.
> Assume there's a table for storing the tax information, called TaxInfo.
> Its got a PK column, TaxID, and other tables have an FK to TaxID.
> There's a "Percentage" column in this table, whose value might change
> over a period of time. Of course, there is a Name column and a
> Description column.
> Now, assume there is some existing data in other tables which refer to
> this Percentage value. If this value changes, all existing data in
> other tables should continue to use the older Percentage value, whereas
> any new and subsequent data in the other tables should use the newer
> value.
> I'm thinking two date columns, ValidFrom and ValidUpto in the TaxInfo
> table. ValidFrom is the date on which the row is inserted. ValidUpto
> will normally be NULL, but if there's a change in the Percentage value,
> then the ValidUpto of that row becomes today's date, and there comes
> another row with the new Percentage value and ValidUpto = NULL. So
> whenever I want the active Percentage value, I look for the row with
> ValidUpto = NULL. But, since the TaxID is a PK, there cannot be a
> second row with the same TaxID, and I'm stuck. And there cannot be
> another TaxID representing that tax, since it is the same tax but with
> a different value. If I make TaxID a non-PK column, then how do I
> create a relationship from another table to this table? I think this
> problem should be so common that there should be some kind of a
> readymade design pattern that solves this problem?
> Generally, is this what is called historization? I've heard of this
> term, but I did a Google search for historization and nothing came up!
> What exactly is the technical term, if there's any, for this kind of a
> feature? And if this is not historization, what else is called
> historization?
> - Ramesh
>|||Hi Uri,
Thanks for mentioning that article. It talks about archiving, but what
I need is not archiving: both the new and the old rows ought to be
there in the table, so I guess that article doesn't apply to my current
problem. Thanks a lot anyway.
- Ramesh|||Your idea of versioning the tax information using ValidFrom and ValidTo is
the right idea and a standard solution. In this specific case, perhaps you
should have a compound primary key consisting of TaxID + ValidFrom. This
concept is often referred to as "data versioning". It is one common
component of (but does not entirely define) data warehousing.
http://www.dmreview.com/article_sub...ticleID=1025568
http://www.dmreview.com/article_sub.cfm?articleID=7202
"Ramesh" <dramesh@.rushmorent.com> wrote in message
news:1141046170.179430.102730@.i40g2000cwc.googlegroups.com...
> Hi!
> I have a question regarding what is the best way to maintain history
> information in a table. I'm quite new to this, so please excuse me if
> this is a common or a silly question.
> Assume there's a table for storing the tax information, called TaxInfo.
> Its got a PK column, TaxID, and other tables have an FK to TaxID.
> There's a "Percentage" column in this table, whose value might change
> over a period of time. Of course, there is a Name column and a
> Description column.
> Now, assume there is some existing data in other tables which refer to
> this Percentage value. If this value changes, all existing data in
> other tables should continue to use the older Percentage value, whereas
> any new and subsequent data in the other tables should use the newer
> value.
> I'm thinking two date columns, ValidFrom and ValidUpto in the TaxInfo
> table. ValidFrom is the date on which the row is inserted. ValidUpto
> will normally be NULL, but if there's a change in the Percentage value,
> then the ValidUpto of that row becomes today's date, and there comes
> another row with the new Percentage value and ValidUpto = NULL. So
> whenever I want the active Percentage value, I look for the row with
> ValidUpto = NULL. But, since the TaxID is a PK, there cannot be a
> second row with the same TaxID, and I'm stuck. And there cannot be
> another TaxID representing that tax, since it is the same tax but with
> a different value. If I make TaxID a non-PK column, then how do I
> create a relationship from another table to this table? I think this
> problem should be so common that there should be some kind of a
> readymade design pattern that solves this problem?
> Generally, is this what is called historization? I've heard of this
> term, but I did a Google search for historization and nothing came up!
> What exactly is the technical term, if there's any, for this kind of a
> feature? And if this is not historization, what else is called
> historization?
> - Ramesh
>|||I know CELKO would say the way you are going about it is the best (at least
in the fact that you have two columns for the start and end dates).
Personally, I have only seen this done once, and the programmers that did it
were not very good, and there were all types of problems with their
implementation. One the other hand, I have seen it done with one date quite
well, where you use only a start date and always use a subquery to select
the maximum date <= [today] or <= [join date]. The sub query generally
preforms well and once you write the code once you can cut and paste it
everywhere else. You don't care about maintaining the end date, which is
basically dependent on the esistence of the begin date in the next row.
Keep in mind that this is just my opinion, but I have seen the exact tax
tables you mention handled in this way, along with employee data, department
lists, and lookup tables of all sorts. Unfortunately, I do not have the
benefit of having seen the begin/end date approach done correctly, so I
can't give a fair comparison of the two methods.
Hopefully someone else can chime in with their experiences and speak to the
pros and cons of each.
Joe, if you could address the issue personally, I think it would make for an
excellent discussion.
"Ramesh" <dramesh@.rushmorent.com> wrote in message
news:1141046170.179430.102730@.i40g2000cwc.googlegroups.com...
> Hi!
> I have a question regarding what is the best way to maintain history
> information in a table. I'm quite new to this, so please excuse me if
> this is a common or a silly question.
> Assume there's a table for storing the tax information, called TaxInfo.
> Its got a PK column, TaxID, and other tables have an FK to TaxID.
> There's a "Percentage" column in this table, whose value might change
> over a period of time. Of course, there is a Name column and a
> Description column.
> Now, assume there is some existing data in other tables which refer to
> this Percentage value. If this value changes, all existing data in
> other tables should continue to use the older Percentage value, whereas
> any new and subsequent data in the other tables should use the newer
> value.
> I'm thinking two date columns, ValidFrom and ValidUpto in the TaxInfo
> table. ValidFrom is the date on which the row is inserted. ValidUpto
> will normally be NULL, but if there's a change in the Percentage value,
> then the ValidUpto of that row becomes today's date, and there comes
> another row with the new Percentage value and ValidUpto = NULL. So
> whenever I want the active Percentage value, I look for the row with
> ValidUpto = NULL. But, since the TaxID is a PK, there cannot be a
> second row with the same TaxID, and I'm stuck. And there cannot be
> another TaxID representing that tax, since it is the same tax but with
> a different value. If I make TaxID a non-PK column, then how do I
> create a relationship from another table to this table? I think this
> problem should be so common that there should be some kind of a
> readymade design pattern that solves this problem?
> Generally, is this what is called historization? I've heard of this
> term, but I did a Google search for historization and nothing came up!
> What exactly is the technical term, if there's any, for this kind of a
> feature? And if this is not historization, what else is called
> historization?
> - Ramesh
>|||I think your problem is that you are trying to use TaxID for two purposes -
to identify a particular class of tax, and to identify a particular
time-restricted value of that class. Your taxinfo table could have a
separate column called 'TaxClass'. Now when you are searching for the tax
rate to be added to a new row in another table, you look through the TaxIDs
within the correct TaxClass and find the row with the null 'ValidTo' column.
You now use the TaxID of this row in your foreign key column.
When the tax rate changes for that class, you add a new row (with a new
TaxID) for that Tax Class, modify the previous 'current' row by writing the
'ValidTo' date, and subsequent entries will use the new value.
Sorry if I've misunderstood the problem in any way - I just skipped through
in an idle moment.
"Ramesh" <dramesh@.rushmorent.com> wrote in message
news:1141046170.179430.102730@.i40g2000cwc.googlegroups.com...
> Hi!
> I have a question regarding what is the best way to maintain history
> information in a table. I'm quite new to this, so please excuse me if
> this is a common or a silly question.
> Assume there's a table for storing the tax information, called TaxInfo.
> Its got a PK column, TaxID, and other tables have an FK to TaxID.
> There's a "Percentage" column in this table, whose value might change
> over a period of time. Of course, there is a Name column and a
> Description column.
> Now, assume there is some existing data in other tables which refer to
> this Percentage value. If this value changes, all existing data in
> other tables should continue to use the older Percentage value, whereas
> any new and subsequent data in the other tables should use the newer
> value.
> I'm thinking two date columns, ValidFrom and ValidUpto in the TaxInfo
> table. ValidFrom is the date on which the row is inserted. ValidUpto
> will normally be NULL, but if there's a change in the Percentage value,
> then the ValidUpto of that row becomes today's date, and there comes
> another row with the new Percentage value and ValidUpto = NULL. So
> whenever I want the active Percentage value, I look for the row with
> ValidUpto = NULL. But, since the TaxID is a PK, there cannot be a
> second row with the same TaxID, and I'm stuck. And there cannot be
> another TaxID representing that tax, since it is the same tax but with
> a different value. If I make TaxID a non-PK column, then how do I
> create a relationship from another table to this table? I think this
> problem should be so common that there should be some kind of a
> readymade design pattern that solves this problem?
> Generally, is this what is called historization? I've heard of this
> term, but I did a Google search for historization and nothing came up!
> What exactly is the technical term, if there's any, for this kind of a
> feature? And if this is not historization, what else is called
> historization?
> - Ramesh
>|||Xref: TK2MSFTNGP08.phx.gbl microsoft.public.sqlserver.programming:587521
On 27 Feb 2006 05:16:10 -0800, Ramesh wrote:
(snip)
>I'm thinking two date columns, ValidFrom and ValidUpto in the TaxInfo
>table. ValidFrom is the date on which the row is inserted. ValidUpto
>will normally be NULL, but if there's a change in the Percentage value,
>then the ValidUpto of that row becomes today's date, and there comes
>another row with the new Percentage value and ValidUpto = NULL. So
>whenever I want the active Percentage value, I look for the row with
>ValidUpto = NULL.
Hi Ramesh,
Good thinking. However, you might find it easier to handle things if you
explicitly store the maximum possible date value (9999-12-31) in the
ValidUpto column if it's still valid. You can then easily find a row
that is "current" for a given date by looking for a row with
ValidFrom <= @.TheDate AND ValidUpto < @.TheDate
With the NULL, you'd require all kinds of extra null handling.
Beware that you'll have to make sure to develop some foolproof triggers
to guarantee that you can never have anomalies like two percentages that
are valid on the same day, or a day with no valid percentage. (You can
prevent these issues by having just the ValidFrom date, as suggested by
Jim, but you'll find the pperformance of that version to be slower).

> But, since the TaxID is a PK, there cannot be a
>second row with the same TaxID, and I'm stuck. And there cannot be
>another TaxID representing that tax, since it is the same tax but with
>a different value.
Primary key is either (TaxID, ValidFrom) or (TaxID, ValidUpto). And the
combination that you don't choose as PK should be declared UNIQUE.

> If I make TaxID a non-PK column, then how do I
>create a relationship from another table to this table? I think this
>problem should be so common that there should be some kind of a
>readymade design pattern that solves this problem?
You can't have a strict relationship, since SQL Server won't understand
that the values (TaxID = 7; TaxDate = '20060218') should refer to the
row with key values (TaxID = 7; ValidFrom = '20060101'; ValidUpto =
'20061231').
You could consider adding a TaxPeriodID to this table and making that
the primary key, but I feel that this is a somewhat kludgy solution. I'd
leave it as is and accept that SQL Server won't understand the exact
relation between the tables. Downside is that you'll have to provide
your own referential integrity handling (in triggers, presumably).
Hugo Kornelis, SQL Server MVP|||I think I will go with a PK made up of both TaxID and ValidFrom
columns. Too bad SQL Server won't let me "see" the relationships on the
diagrams...
Thanks so very much to all of you for your inputs and suggestions.
- Ramesh

No comments:

Post a Comment