Friday, March 30, 2012

How to make your SQL Server database grow exponentially with linear data

We have this database that we use to collect 1000's of rows of data a day.
Particularly in one table we collect about 1000 pieces of data a day in the
form of very small blobs (64k or so). Our database has been growing
steadily from Nov 2003 with an initial size of 7gb to about 25gb on 10 April
2004. Not bad, seeing the amount of data we collect. Then from 10 April to
today (23 April) it grew to 65gb in about 2 weeks. What's going on? Did
our input data increase exponentially? No, according to our logs data have
been arriving in a linear fashion. Maybe there's about 40gb of unused space
in the database? No, after we rebuild the indexes (using dbcc updateusage)
SQL Server reports no free space. Not possible you say?
Well, turns out the problem are with the blob fields. Because we want to
minimize the time it takes to search through the table, we wrote a service
that takes 2 rows, adds the blobs together, then updates the one row with
the resulting blob and deletes the other row. The service in other words
doubles up the blobs. The resulting blobs are also doubled up and so forth.
This in itself should not be a problem because theoretically the new row
should take up the same space (or even slightly less) than the original 2
rows together. Wrong! What happens is when a blob gets bigger than 64k,
SQL Server does not store the blob data with the row anymore, but moves it
to a separate location. Now when you double up a blob and try to update the
row SQL Server realizes it can't store the blob in the same space as the
previous one, so it allocates new space for it. So even though you aren't
adding any new data to the database, the database file actually grows
exponentially! Now some smaller blobs will eventually take up the old space
of a bigger blob, but they are much lower on the exponential curve than the
bigger blobs, so in total your database file grows near exponential with
linear data. This can be quite a problem when your blob size gets to about
50mb or so, and is the cause of our database growing so quickly in such a
short time.
Also, what worsens the problem is that it seems that SQL Server does not
take these deallocated blob space into account when it calculates "unused"
space in the database. Now if we deleted both rows in a double-up operation
and inserted a new row we probably would have been able to shrink the
database periodically. But there does not seem to be anything we can do to
reclaim deallocated blob space Our aim is to stop the blobs growing when
they reach a certain size and archive them off. Has anybody experienced the
same problem, or are there any suggestions as to how we can reclaim (or
avoid) this "unused" space?
Thanks,
Pieter
> We have this database that we use to collect 1000's of rows of data a day.
> Particularly in one table we collect about 1000 pieces of data a day in
the
> form of very small blobs (64k or so). Our database has been growing
> steadily from Nov 2003 with an initial size of 7gb to about 25gb on 10
April
> 2004. Not bad, seeing the amount of data we collect. Then from 10 April
to
> today (23 April) it grew to 65gb in about 2 weeks. What's going on? Did
> our input data increase exponentially? No, according to our logs data
have
> been arriving in a linear fashion. Maybe there's about 40gb of unused
space
> in the database? No, after we rebuild the indexes (using dbcc
updateusage)
> SQL Server reports no free space. Not possible you say?
>
> Well, turns out the problem are with the blob fields. Because we want to
> minimize the time it takes to search through the table, we wrote a service
> that takes 2 rows, adds the blobs together, then updates the one row with
> the resulting blob and deletes the other row. The service in other words
> doubles up the blobs. The resulting blobs are also doubled up and so
forth.
> This in itself should not be a problem because theoretically the new row
> should take up the same space (or even slightly less) than the original 2
> rows together. Wrong! What happens is when a blob gets bigger than 64k,
> SQL Server does not store the blob data with the row anymore, but moves it
> to a separate location. Now when you double up a blob and try to update
the
> row SQL Server realizes it can't store the blob in the same space as the
> previous one, so it allocates new space for it. So even though you aren't
> adding any new data to the database, the database file actually grows
> exponentially! Now some smaller blobs will eventually take up the old
space
> of a bigger blob, but they are much lower on the exponential curve than
the
> bigger blobs, so in total your database file grows near exponential with
> linear data. This can be quite a problem when your blob size gets to
about
> 50mb or so, and is the cause of our database growing so quickly in such a
> short time.
>
> Also, what worsens the problem is that it seems that SQL Server does not
> take these deallocated blob space into account when it calculates "unused"
> space in the database. Now if we deleted both rows in a double-up
operation
> and inserted a new row we probably would have been able to shrink the
> database periodically. But there does not seem to be anything we can do
to
> reclaim deallocated blob space Our aim is to stop the blobs growing when
> they reach a certain size and archive them off. Has anybody experienced
the
> same problem, or are there any suggestions as to how we can reclaim (or
> avoid) this "unused" space?
> Thanks,
> Pieter
Hi Pieter,
Have you tried DBCC ShrinkDB and ShrinkFile?
Hope this helps,
Eric Crdenas
Senior support professional
This posting is provided "AS IS" with no warranties, and confers no rights.
|||We have, thanks, with no improvement. Turns out it's a bug in SQL Server,
see the article
http://support.microsoft.com/default...b;en-us;324432
When we copied all the rows to a new table, we recovered 43gb of the 60gb
database, which means there was only about 17gb of actual data in the
database. We now have a "recovery" plan where we store the blobs in a
separate data file, so when it gets out of control again we can just move
the blobs in the data file to a new data file without having to take the
database offline again.
Tnx,
Pieter
"Eric Crdenas [MSFT]" <ecardena@.online.microsoft.com> wrote in message
news:F39XstANEHA.2756@.cpmsftngxa10.phx.gbl...[vbcol=seagreen]
day.[vbcol=seagreen]
> the
> April
April[vbcol=seagreen]
> to
Did[vbcol=seagreen]
> have
> space
> updateusage)
to[vbcol=seagreen]
service[vbcol=seagreen]
with[vbcol=seagreen]
words[vbcol=seagreen]
> forth.
2[vbcol=seagreen]
64k,[vbcol=seagreen]
it[vbcol=seagreen]
> the
aren't[vbcol=seagreen]
> space
> the
> about
a[vbcol=seagreen]
"unused"
> operation
> to
> the
> --
> Hi Pieter,
> Have you tried DBCC ShrinkDB and ShrinkFile?
> Hope this helps,
> --
> Eric Crdenas
> Senior support professional
> This posting is provided "AS IS" with no warranties, and confers no
rights.
>
|||> We have, thanks, with no improvement. Turns out it's a bug in SQL Server,
> see the article
> http://support.microsoft.com/default...b;en-us;324432
> When we copied all the rows to a new table, we recovered 43gb of the 60gb
> database, which means there was only about 17gb of actual data in the
> database. We now have a "recovery" plan where we store the blobs in a
> separate data file, so when it gets out of control again we can just move
> the blobs in the data file to a new data file without having to take the
> database offline again.
>
For the record, the bug ID is 357405 for SQL Server 2000. The bug is
scheduled to be fixed in Yukon.
Regards,
Eric Crdenas
Senior support professional
This posting is provided "AS IS" with no warranties, and confers no rights.
|||Thank you, looking forward to it
Regards,
Pieter
"Eric Crdenas [MSFT]" <ecardena@.online.microsoft.com> wrote in message
news:YYHs2biNEHA.3808@.cpmsftngxa10.phx.gbl...[vbcol=seagreen]
Server,[vbcol=seagreen]
60gb[vbcol=seagreen]
move
> --
> For the record, the bug ID is 357405 for SQL Server 2000. The bug is
> scheduled to be fixed in Yukon.
> Regards,
> --
> Eric Crdenas
> Senior support professional
> This posting is provided "AS IS" with no warranties, and confers no
rights.
>
|||Hi again,
I am very interested to know if the bug is caused by creating new blobs and
deleting old ones, or from updating and growing existing blobs.
eg. We have 2 scenarios. Lets say in common we have Blob A and B which are
the sources, and Blob C which is the target.
Scenario 1:
Blob A is read
Blob B is read
Blob C is created (A+B)
Blob A+B is deleted
Scenario 2:
Blob A is read
Blob B is read
Blob B is updated (A+B)
Blob A is deleted
In our case Scenario 2 is valid. Would scenario 1 be able to shield us from
this problem?
Regards,
Pieter
"Eric Crdenas [MSFT]" <ecardena@.online.microsoft.com> wrote in message
news:YYHs2biNEHA.3808@.cpmsftngxa10.phx.gbl...[vbcol=seagreen]
Server,[vbcol=seagreen]
60gb[vbcol=seagreen]
move
> --
> For the record, the bug ID is 357405 for SQL Server 2000. The bug is
> scheduled to be fixed in Yukon.
> Regards,
> --
> Eric Crdenas
> Senior support professional
> This posting is provided "AS IS" with no warranties, and confers no
rights.
>

No comments:

Post a Comment