The Case of the Stubbornly Incomplete Booking
The other day, I was at work (as you do), and I felt like Sherlock. I was working through a bug in the system, and a puzzle piece just clicked and I felt good, and I felt compelled to put it out there, even if no one is reading this.
This is a bit of a complex issue, so strap yourself in. It’s made a little more complex by the fact that you, dear reader, do not have the shared context that my coworkers did (unless you are a coworker).
I work for a company called SalesMaster, and we offer a pretty sweet showroom app for car manufacturers and retailers (if I do say so myself). Users of our system can create a record representing an enquiry that can have various reasons for existing. One of these types of enquiries is the ExperienceBookingEnquiry, which is an enquiry that gets created for a manufacturer who offer experience days (customers can spend a day driving a similar vehicle off road in a dedicated centre). A booking can then be attached to this enquiry.
Experience bookings, as well as their enquiries, can be in a finite number of states that, taken together, represent a workflow. We use a gem called “workflow” to manage this workflow. This particular gem has a hook that allow the programmer to trigger events when entering states.
The following two figures represent the states that an experience booking and experience booking enquiry can be in and the states it can move to from there. An arrow pointing from state A to state B means that the record can move to state B if it’s in state B. The left-most state is the one that the record starts in (don’t ask me why the booking starts off at “confirmed”).
Figure A: experience booking enquiry
New ----> Booked ----> Completed
^----------/ \---> Declined
Figure B: experience booking
Confirmed ----> Completed
\---> Cancelled
Here are the trigger-like behaviours to note:
- An experience booking enquiry is moved to the “Booked” state when an experience booking is created on it (but not through triggers, we just do this one through an interactor).
- When the booking enters the “Completed” state, it tries to move its associated enquiry to the “Completed” state as well.
- When the booking enters the “Cancelled” state, it tries to move its associated enquiry to the “New” state (but only if the enquiry is in the “Booked” state, so it won’t try to move an enquiry from “Completed” to “New”).
Also of relevance is the fact that an enquiry can have multiple bookings made against it. Quicker readers that are willing to assume may see where I’m going with this already, and they’d probably be right, but they wouldn’t have the full story with the human and luck elements to it. If you’re interested in that, stick around.
I had been working on a bug that could be described as follows: some experience bookings were unable to be moved to the “Completed” state, which I figured out was because the enquiry would refuse to progress to “Completed” with it. It turned out that these enquiries were left in the “New” state, when they should have been in the “Booked” state (clearly, as they had an associated booking). Trying to figure out why these enquiries weren’t in the “Booked” state proved fruitless. Looking at the interactor:
if booking.save
progress_enquiry booking.enquiry
I thought, maybe the booking would succeed, but the enquiry would fail to move to the booked state. Of course, it was hard to diagnose why at this point, as I did not have the error logs from this occurrence (which never occurred). Eventually, I decided to wrap the thing in a transaction and just note on the PR that I hadn’t attacked the problem directly (which I should have done in the commit message and been clearer about).
We have a killer feature, which is that you can view bookings in a diary view that shows the vehicles they belong to and the time they span in a very efficient visual manner. You can also drag and drop bookings to change their time, and even their vehicle. When dragging between vehicles, the app actually creates a whole new booking from the template of the old, then cancels the old one. Here’s where luck comes into play: the sprint contained another bug to do with bookings, which was that very occasionally, some bookings, when moved to another vehicle (i.e. cloned and cancelled) would then have their VRMs (number plate) point to the new vehicle on the old, cancelled booking. If this bug weren’t on my mind, I would have never made the connection that I did.
I spent some time on that bug, then came back to the previous bug when a coworker, while QA testing, asked for some guidance on expected behaviour. Now, this was an unusual QA test. Because I had to simulate the exception being raised when the enquiry is moved to the “Booked” status, thereby leaving a booked enquiry in the “New” status, I could only do it through automated testing. That test is a less interesting story in itself. It would be impossible for a human to test without overriding the method to move the enquiry to “Booked” to raise an exception, then changing it back for production. I asked QA to test that I at least hadn’t broken anything.
What my colleague had needed answering was whether the enquiry was supposed to move back to the “New” state when the booking is cancelled. Oh! A little dramatic irony; all that stuff about workflow that you now know? I wasn’t thinking about any of that while originally working on the bug. We used workflow hooks to trigger events in some cases, but used an interactor for that one case I mentioned. It was also originally triggered through a workflow hook, but had probably been moved to the interactor to keep it self-contained and easy to reason about. Ironically, that probably misled me into believing that I was look at all I needed to, when there were other moving parts that were more pertinent.
It was expected behaviour, but I noticed the app wasn’t checking to see if the enquiry had other bookings on it before moving it to the “New” state. It was at this point I spotted the issue: a booking was being created on an enquiry, then another on the same one. One of them was then being cancelled, moving the enquiry to “New” and leaving the other booking high and dry to be unable to complete. After a colleague I had requested help from asked why users were doing this (as we happened to know this was not supposed to be the process for the manufacturer who reported the issue), a second later, I recalled the issue I had just arrived from working on and realised that users were dragging the booking on the diary view. Some investigation on the affected bookings, including inspection of the activity log, we confirmed this with reasonable certainty.
Maybe anticlimactic, but it certainly gave me a small rush at the time. Feel free to email me at habib@alaminium.me if you liked this story and want to hear more like it. I would welcome it.