Friday, March 23, 2012

How to make a table act like a queue

Currently, I have a table that I am using to queue up items to be processed later. The initial state that they are entered into this table is 'NEW', meaning that they are NOT in the queue. When the status is switched to 'READY', they are ready to be processed by outside application(s). When application(s) wish to grab the next available item in the queue, they call my stored procedure which does some checking and updates the status to 'PROCESSING'. This stored procedure works, but I think there is a better way of doing this that would allow for much fewer deadlocks.

Anyone care to comment?

here's the stored procedure (I came back to clean up some of the logic to make it simpler to see)
----------
CREATE PROCEDURE sp_GetNextDeliveryRequest AS
BEGIN
DECLARE @.IDint,
@.Done bit

SET ROWCOUNT 1
SET @.Done = 0
SET @.ID= NULL

WHILE @.Done = 0
BEGIN

SELECT @.ID= queue_id
FROM tbl_queue
WHERE ((status='READY')
ORDER BY status ASC, entered_date ASC

IF @.ID IS NOT NULL
BEGIN
UPDATE tbl_queue
SET status = 'PROCESSING',
WHERE request_id = @.ID
AND ((status='READY')

IF @.@.ROWCOUNT != 0
SET @.Done = 1
END
ELSE
BREAK
END


IF @.Done != 1
SELECT -1 AS 'QueueID'
ELSE
SELECT @.ID AS 'QueueID'
END
----------SELECT @.DeliveryID = request_id
FROM tbl_delivery_request
WHERE ((status='READY') OR (status IN ('PROCESSING', 'TRY AGAIN')
AND DATEDIFF(minute, process_start_date, getdate()) > 5))
AND delivered_date IS NULL
ORDER BY status ASC, entered_date ASC

Could return more than 1 id, so you'll get the last...

And yes, it sounds like a standard workflow function...

But it looks you're always taking the newest stuff. Why would things go in as New?|||Good point on returning more than 1. I did not realize it would take the last value as opposed to the first value. Thanks for that tip.

Things get entered as new because in our workflow process we needed to record how each delivery request would be handled (even though the item is not yet ready to be delivered). So, this table incorporates information for the delivery_request, as well as it's queue status. I'm in agreeance that this structure could be refactored for the better, but for now it is what it is.

So, is there a simpler (better) way of performing this function? Basically I'd like to get the next id without having another process get the same id.|||A couple of coding points:

Instead of using rowcount, use TOP 1 in your SELECT statement.

WHERE ((status='READY') OR (status IN ('PROCESSING', 'TRY AGAIN')))
is equivalent to the more concise:
WHERE status IN ('READY', 'PROCESSING', 'TRY AGAIN')

Since @.Done is a bit value, you can just set it like this:
SET @.Done = @.@.ROWCOUNT
...as the @.@.ROWCOUNT int value will be implicitly translated to a bit value.

...so your procedure could be rewritten like this:
CREATE PROCEDURE sp_GetNextDeliveryRequest AS
BEGIN
DECLARE @.DeliveryID int, @.Done bit

SET @.Done = 0
SET @.DeliveryID = NULL

WHILE @.Done = 0
BEGIN
SET @.DeliveryID =
(SELECT TOP 1 request_id
FROM tbl_delivery_request
WHERE status IN ('READY', 'PROCESSING', 'TRY AGAIN')
AND DATEDIFF(minute, process_start_date, getdate()) > 5
AND delivered_date IS NULL
ORDER BY status ASC, entered_date ASC)

IF @.DeliveryID IS NOT NULL
BEGIN
UPDATE tbl_delivery_request
SET status = 'PROCESSING', process_start_date = getdate()
WHERE request_id = @.DeliveryID
AND status IN ('READY', 'PROCESSING', 'TRY AGAIN')
AND DATEDIFF(minute, process_start_date, getdate()) > 5
AND delivered_date IS NULL

SET @.Done = @.@.ROWCOUNT
END
ELSE BREAK
END
END

But the problem I think you are going to run into is that this procedure runs continuously until it succesfully finds a request_id. If that takes a few seconds or more, and another application initiates the same query, that is where I suspect your deadlocks are coming from.

I'd recommend that you rewrite the procedure to return null if there is no available request_id, and that your application be programmed to wait five or ten seconds before resubmitting.

blindman|||blindman,

Thanks for the tips. I'm making the modifications right now to see what kind of positive impacts they have. As for combining the status check, I have to keep that separate (notice the parenthesis) because of the additional date check for queued items in 'PROCESSING' or 'TRY AGAIN' status. Everything else seems to make perfect sense.

The one thing that I am stuck with though is a tradeoff of processing in my application versus processing in the stored procedure. If I choose to try only once to get the next available request_id, then I may (or may not) find one and be able to update the row before the next application tries. So, my call would effectively return null, meaning that there are no requests to handle. Now, I can choose to sleep for my current sleep time (60 seconds) and then repoll only to have the same situation happen (another application updates the record before I am able to update, causing the stored procedure to return null). So, the application (or thread) could end up spending a lot of time sleeping when there are requests to process. This was the reason why I choose to implement the looping at the stored procedure level, but this has its drawbacks (as evident by the deadlocks). I just wasn't sure if there was some better way of handling my situation without causing so many deadlocks. Like performing the update/lookup in 1 statement.|||Well...yeah...

Use set based operations, and get rid of

WHERE request_id = @.DeliveryID

In the update...

No?|||Add a column to your table called ProcessID and set its type to UniqueIdentifier. Then use this logic in your procedure:
------------
declare @.ProcessID UniqueIdentifier
set @.ProcessID = NewID()

UPDATE tbl_queue SET ProcessID = @.ProcessID, status = 'PROCESSING'
from tbl_queue
inner join (Select Top 1 request_id from tblequeue where(Your Criteria...)) NextRequest
on tbl_queue.request_id = NextRequest.request_id

Set @.ID = (select request_id from tbl_queue where ProcessID = @.ProcessID)
------------

Because the record is located and marked all in one statement, I think it will fix your deadlocks.

blindman

Brett; set based operations? What's your idea? Can you give more detail?|||What's the difference between:

SET @.DeliveryID =
(SELECT TOP 1 request_id
FROM tbl_delivery_request
WHERE status IN ('READY', 'PROCESSING', 'TRY AGAIN')
AND DATEDIFF(minute, process_start_date, getdate()) > 5
AND delivered_date IS NULL
ORDER BY status ASC, entered_date ASC)

IF @.DeliveryID IS NOT NULL
BEGIN
UPDATE tbl_delivery_request
SET status = 'PROCESSING', process_start_date = getdate()
WHERE request_id = @.DeliveryID
AND status IN ('READY', 'PROCESSING', 'TRY AGAIN')
AND DATEDIFF(minute, process_start_date, getdate()) > 5
AND delivered_date IS NULL

And

UPDATE tbl_delivery_request
SET status = 'PROCESSING', process_start_date = getdate()
WHERE status IN ('READY', 'PROCESSING', 'TRY AGAIN')
AND DATEDIFF(minute, process_start_date, getdate()) > 5
AND delivered_date IS NULL

But I could be missing it...damn hangover...|||I think he needs to know and return the @.DeliverID value.

Go easy on the Margaritas. The weekends over Kaiser! :cool:

blindman|||Brett,
Blindman is correct. I needed to return the id of the next request to process. Blindman, thanks for the alternative. I've actually used that type of mechanism before (with some suttle differences). There is only one caveat to that, and that is to be careful that your spid is unique per call. That gets me in to some trouble here since I am pooling my connections to the database. So even though 2 threads will have 2 separate connections (and thus 2 separate spids), there could be more than 1 entry in the table with the spid=myspid.

Blindman, Brett, I really do appreciate the help. You're giving me a lot of ideas, and causing me to rethink why I've choosen this implementation to begin with! :)|||Your SPID is guaranteed to be unique if you use the NewID function.

100% Gar-run-teed, or yer munny back!

blindman|||I take that back. I belive I stepped right over that line. Well great solution. kudos.|||I still think its better development practice to return a null if there are no available request_ids, and then have the application resubmit five or ten seconds later.

I don't like the idea of procedures that could potentially run in endless loops.

blindman|||Do you really mean spid?

I'm soooo confused...|||It doesn't run an endless loop. It runs until a request is found and your process could update the record, or until no more requests are available in the queue (see the break statement). I do agree that it would be more advantageous to have what would be analagous to a synchronized block so that other processes requesting this stored procedure would simply wait until the current process exited the block.

That kind of logic would have to be performed at the application level, and for distributed computing issues (ability to run multiple applications to process this queue), does not fit my needs.|||No, I don't mean SPID. I mean GUID.

You should not need to care about whether your SPID is unique.

blindman

No comments:

Post a Comment