#Google Analytic Tracker

Pages

Dec 27, 2011

How the Developers Sell a Bug as a Feature

Awhile ago one of my colleges handed me the above picture. It was funny, so I put it up on my wall. I could see how this situation could happen, but couldn’t recall any good example in my past experience.

Usually a bug in a software is fairly obvious.  If there is a debate among the stakeholders and the developers, it would properly be a design problem more than a software bug.

During the last sprint of the release, my team has debated a side effect that was introduced due to an UI architecture change. (I am guilty of making such change) We joked about it and believed that we could sell this bug as a FEATURE, and we named this feature “Instance Preview”. Similar to the format preview feature in MS Word 2007, user can see how the text formatting change simply by hovering your mouse over the tool bar items.

Background Story

A couple of years ago, our development team came up with a new UI architecture design that manages how data are modified in the application. It allows users cancel their changes at multiple sub form (windows) level. The application was tightly coupled to a typed DataSet. The dataset acts as a in-memory database for the application to manipulate data. Most of the UI controls are bind to the DataSet though data binding.

So we know we need some sort of versioning to keep track of data changes. The DataSet has a row version feature, but this feature wasn’t well understood by the development team back then. Therefore we implemented our own way.  We called it the FormTransaction. Each FormTransaction object contains a DataSet, and has a reference to a parent FormTransaction. Whenever user drills down the data detail, a sub window is opened, a form transaction object is also created, which contains a subset of data from the main data set.

image

The Main DataSet is the application main data repository. The forms are data bind to their FormTransaction’s DataSet. When a user makes a change in the UI, data are immediately updated in the DataSet. If another form updated the same DataSet, the changes would also immediately shown on other binded UI. If user want to discard the change, they simply click on a Cancel button on the form, and the sub dataset are disposed. If user want the save the change, the data are merge to the parent dataset, until all the changes reach to the Main DataSet.

FormTransaction Advantage:

  • Cancel change implement is easy, you simply close the form and dispose the FormTransaction object
  • You can control what data are loaded to the sub DataSet, which also acts as a filter

FormTransaction Disadvantage:

  • DataSet constraint validation can only run when data are merge back to the Main DataSet, this makes debugging more challenging because you won’t know which form created the invalid data until data are merged back to the Main DataSet..
  • Increase memory usage. If you need to load a lot of data in the subform, you need to duplicate the rows
  • It takes time to load data from Main DataSet to the sub DataSet, and it also take time to merge data changes to the parent DataSet.

Over the past year, we notices that our application is getting slower because we need to handle more and more data. The DataSet definition has grown and more data are needed to do certain calculation. FormTransaction were slowing down the application.

In addition, there was another feature that we need to do but it requires a lot of data to be loaded to a sub form. This UI architecture simply does not scale at all.  Thus, we came up with a newer approach using the opposite idea of the FormTransaction. We call it the RollbackTransaction

image

RollbackTransaction Advantage

  • No memory duplication, all the form works on the Main DataSet.
  • No need to load data, or figure out what data you need in the sub form, and no need to merge.
  • You have access to all the data in your sub form all the time.
  • Changes that made by the user are immediately validated

RollbackTransaction Disadvantage

  • Need to implement data filtering on all the data bind controls, since we don’t use FormTransaction to filter data.
  • If user made a lot of changes, undo (rollback) the changes can be slow

The implementation of the RollbackTransaction is pretty trivial. You basically backup all the original values that were changing. The DataTable comes with the following events:

You can attach to these events and store data change into a queue.  When the user wants to undo the change, reply the the action in an undo fashion. (i.e. if a new row was added, delete it)

Side Effects

Rollback transaction was great, it speeded up the application performance. However, there was a minor side effect. Since all the forms that uses RollbackTransaction are bind to the same DataSet, any changes you made in the child form will immediately show in the parent form. This side effect looks really cool, because you can see how the parent form behaves as you modify the child form. Especially if the parent form has a sorted grid, as you change the name of the sorted column value in the child form, the grid will resort itself as you type.

image

Had we Break the User Experience UX?

If you take a closer look at the our application design (above), you will notice there is a OK button and a Cancel button. The major conflict introduced by this side effect is that without user clicking on the OK button, the parent form are being updated as user making the change. The good news is that these windows are modal, which means user can’t edit the data from multiple windows. Most of the time user may not even see the parent form is updating because it is covered by the child form.

With many tasks on hands, our team left this minor side effect and continue on other developments.

Trying to Fix the Side Effect

To fix the issue, our idea is to update the DataSet, but delay the parent form update even it is data bind to the same DataSet.  We know that we can’t just unblind the form because the form would simply show no data. It would look silly, and it is worst than the side effect.

There has to be a way to suspend the datablinding. The gird is the first UI control that we want to suspend the data binding. We use Infragistic UltraGrid. UltraGrid had many functionalities and I was hopeful that it would have a method where I can suspend the data binding, and resume it which causes the gird to refresh itself.

We tried a number of solutions:

  1. UltraGrid.BeginUpdate() – This method looked very promising. In fact, it was the first solution that I implemented. It actually stop UltraGrid from painting itself, and I got what I expected, until the a QA come to me with the following issue:Schedule grid
    So, what’s the going on, why would the QA’s machine behaves differently than my machine. Well there is a big different, I was running Windows 7 with Aero Theme, while the QA was running Windows XP with Classic Theme.

    Aero Theme works most likely due to Windows secretly take a screen capture of the windows, this is need to create the Aero glass-liked effect on the title bar. When UltraGrid.BeginUpdate() is called, it stops the grid from painting, but Aero Theme override the painting somehow. In XP or Win 7 Classic Theme, it doesn’t do that. As a result you get the above issue.
  2. UltraDataSource.SuspendBindingNotifications() – This method also look promising. Unfortunately we don’t use UltraDataSource, since the gird bind directly to the DataSet.
  3. Create a data binding proxy – Using a proxy,  we have more control on what data go into the UI. However, such implementation can get complicated, and this really the same concept as the Form Transaction.
  4. Override the CurrencyManager – Totally a wrong concept, CurrentManager can’t control when the UI updates, its purpose is to synchronize record navigation with different controls.
  5. Use a picture box to mask the underlying changes – I saw this idea on a forum. I thought it was funny, so I mentioned this idea to the team, and we concluded that there should be a better solution. The limitation using a picture box is that you can’t interact with the grid at all. However, in our situation, we use modal window, so it may work. 

Why Don’t We Just Sell the Bug as a Feature!

Failing to find a good solution to fix this side effect, or a bug, the development team (which include myself) tried to convince ourselves that this bug is really an awesome feature. Existing customers may find it odd, but its functionality still works, new customers probably wouldn’t care, all we need to do is sell this bug as a feature to the QA team and other stakeholders: We call it “Instance Preview”.

Instance Preview – Similar to MS Word preview feature when applying format change, user can immediately see data change as they make modification to the detail form before changes are committed.

I bet that QA team won’t even know that it is a bug! We can market this as a new production feature. I think this is just evil genius. It saves developers time from fixing the problem while we have a new feature.

Solution

Of course, who are we kidding with.  No customers ever ask for such feature, and adding feature that doesn’t have a demand and proper requirement would create more maintenance problem down the road. So I contacted to Infragistics and hope they could provide a better solution:

http://forums.infragistics.com/forums/p/60424/306715.aspx#306715

And the solution is… (drum rolling)… picture box!!!

Here is a code snippet:

   1: Size s = ultraGrid1.Size;
   2: Bitmap memoryImage = new Bitmap(s.Width, s.Height);
   3:  
   4: grid.DrawToBitmap(memoryImage,new Rectangle(grid.Location, grid.Size));
   5: PictureBox p = new PictureBox();
   6: p.Name = "CoverGrid";
   7: p.Image = memoryImage;
   8: p.Location = grid.Location;
   9: p.Size = s;
  10: this.Controls.Add(p);
  11: p.BringToFront();
  12: ultraGrid1.BeginUpdate();

The code turns out to be more simple than what I was expecting. Originally I was about how the PictureBox behave if it is masked over the original windows. Would the picture box follows the gird if user move the window around.  As you can see, because the PictureBox.Location is a Point struct, and pictureBox.Size is a Size struct. These are reference type and as a result the PictureBox would simply be the same as the grid control’s location and size.

Ultimately, our team call these technique “PictureBoxing”.

This technique not only works for the grid, but also some other controls in our application.

Others Bug-Liked Features or Feature-Liked Bugs

  1. Incorrect decimal rounding due to unclear specification
  2. Duplicate data begin generate and you told your customer this increase reliability
  3. Click and hold on a Windows 7 title bar and shake it that minimize other windows
  4. Shaking your iOS device and it pops up “There is nothing to undo”
  5. Logon screen that ask you if you want to save your password, and yet you have to retype your password the next time your run the application. Reason: The saved password only used during the session to reduce multiple logons, and the application never persist the password.
  6. Calculation appears to be incorrect because it was displaying in metric while users expected the value is in empirical.

I am sure there are properly many similar situations which a bug and a feature is merely a different perceptive among different people.

Aug 30, 2011

Compiling C++ project using Visual Studio 2010 with .NET 4.0 from Visual Studio 2008 (MSB8009)

I know there are many posts on the internet for this solution. I like to blog about this so that next time when I have to update another C++ project file, I would remember I can reference my own blog post. Please note that I am not familiar with C++, but hopefully this information would help others.

Problem: When you try compile a C++ project that was created in VS 2008 using VS 2010, you get the following error:

C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\Microsoft.CppBuild.targets(292,5): error MSB8009: .NET Framework 2.0/3.0/3.5 target the v90 platform toolset. Please make sure that Visual Studio 2008 is installed on the machine. [C:\dev\YourCPPProject.vcxproj]

It turns out that there are two places you have to change.

  1. Update the platform toolset from v90 to v100
  2. Update the target framework from v3.5 to to v4.0

Updating the platform toolset can be done in Visual Studio 2010.

  1. Right click on your C++ project (vcxproj) in Solution Explorer
  2. Select Properties
  3. Under “Configuration Properties” node, select “General”.
  4. You should see a property section call “Platform Toolset”, change it to v100.

image

Here is the part that I don’t understand why VS 2010 doesn’t have or implement such feature: How to change the target framework? Guess what, you have to do it manually.

  1. To update from .NET 3.5 to 4.0:
  2. manually open the .vcxproj file in Notepad (or I use Notepad++)
  3. Look for the xml tag <TargetFrameworkVersion>
  4. Update it to v4.0

By the way, you could also change the PlatformToolset manually too.

Once Visual Studio reload the project, your C++ should now be compiled using .NET 4.0 with the proper toolset.

Jul 11, 2011

Profiling Tool Can be Deceiving

Last week in my sprint, I was assigned to improve the loading performance for one of our company products.  The first thing in my mind is Visual Studio 2010 profiler. The Visual Studio 2010 Profiler is so much easier to use and understand compare with the previous version.

First I measure the wall time on how long the application can save the data. To ensure I have consistent testing result, I always generate a new database with no data in it. The save process took about 1:18. Base on experience, it is important to know the wall time before doing any optimization. I did this test a number of times to ensure I get consistent results. 

However, knowing the wall time doesn’t help finding the bottleneck. This is where Visual Studio profiler become very useful.

The two basic profiling options I usually use is “Sampling” and “Instrumentation”. When I first started using profiler (JetBrain DotTrace), I always use “Instrumentation” because I thought that this is the most actuate way to get detail result. Overtime, I started using Sampling. Here are the pros and cons:

Instrumentations
Pro: Most actuate measurement, it goes though line by line to see how long each function takes
Con: extremely slow

Sampling
Pro: Fast, it doesn’t slow down your application runtime much
Con: You don’t get detail result, you get the # of function hit.

To my understand sampling is generally as good as instrumentation. Sampling works by checking to see where the heck your program is executing. If you program ran into a line of code a lot, or if your program wait at the line for too long, the sampler will hit it a lot. That means it is likely that you can optimize the code here because your program spend a lot of time on these functions.

Anyways, this is something that I got:

image

Anyways, I ran though the data and notice the following:

image

Base on the above information, it indicates that these lines are taking too much time. After inspecting the code and see how this function is used. I realized it is unnecessary to create a new DataRow objects for those data row that wasn’t deleted or detach. I could have simply return the same DataRow.

The sole purpose of the above code is to allow the problem to access a deleted data row and convert it into a data access object. Here is the code that is using it:

Code Snippet
  1.  
  2. /// <summary>
  3. /// Converts a data row to a data access object.
  4. /// </summary>
  5. /// <param name="dataRow">Data row to convert.</param>
  6. /// <returns>Newly allocated data access object.</returns>
  7. public override sd_TripPointData ToDataAccess(DataModel.sch_TripPointRow dataRow)
  8. {
  9.     dataRow = CreateCorrespondingDataRowVersion(dataRow);
  10.  
  11.     return new sd_TripPointData(
  12.         dataRow.TripPointKey,
  13.         dataRow.IsTripKeyNull() ? null : dataRow.TripKey,
  14.         dataRow.TripPointID,
  15.         dataRow.ScheduleKey,
  16.         dataRow.IsPatternPointKeyNull() ? null : dataRow.PatternPointKey,
  17.         dataRow.ArriveTime,
  18.         dataRow.DepartTime,
  19.         dataRow.IsHoldTimeNull() ? int.MaxValue : dataRow.HoldTime,
  20.         dataRow.IsMobilityKeyNull() ? null : dataRow.MobilityKey);
  21. }
  22. #endregion

 

So I converted the code to just return the DataRow if the DataRow.RowState is not marked as deleted or detached.

Here is my modified version which is slower than the original:

Code Snippet
  1. public static DataRow CreateCorrespondingDataRowVersion(this DataRow dataRow)
  2. {
  3.     DataRow newTDataRowType = dataRow;
  4.  
  5.     //Use the original version if deleted or detached if original is available.
  6.     if ((dataRow.RowState == DataRowState.Deleted || dataRow.RowState == DataRowState.Detached)
  7.         && dataRow.HasVersion(DataRowVersion.Original))
  8.     {
  9.  
  10.         //Generate a new data row with the corresponding version
  11.         //remark: you cannot access to dataRow.ItemArray.length for a deleted row.
  12.         newTDataRowType = dataRow.Table.NewRow();
  13.         for (int i = 0; i < dataRow.Table.Columns.Count; i++)
  14.         {
  15.             newTDataRowType[i] = dataRow[i, DataRowVersion.Original];
  16.         }
  17.     }
  18.  
  19.     return newTDataRowType;
  20. }

Using the profiler, I no longer see the function “CreateCorrespondingDataRowVersino()” as one of the high function calls.

However, to be sure. I check the wall time again after my code change. It was unbelievable. I ran 3 times. and it took 3:28. 10 second slower than before. To my disbelieve, I revert my change and test it again. The wall time show 3:18, which is the original time.

So where did the increased time come from?

My guess is that creating a new data row is detached from any data table. Therefore accessing a data row that is detached is faster than accessing a row that is linked to the data table.

In conclusion, when doing perform testing, always test the wall time before and after the optimization.

May 14, 2011

Never Upgrade Database Schema When Disk Drive is Defragmented

My company's Data Warehouse development has finally completed for this release. The architecture team is now at the client site and deploying the Data Warehouse solution. We had estimated we will only take 4 hrs to upgrade the Live Database. We will be performing the first ETL offline. Unfortunately we failed the upgrade last time, not because of incorrect script or any technical error. it is because of the following: Instead of taking 4 hrs to do the database upgrade, we have to bail out after 6 hrs of waiting for the upgrade script to complete. Next time, check the hard drive fragmentation first before doing any database upgrade!

Mar 17, 2011

I Need More Memory for Runing ETL Testing

I have been working on a data warehouse project for almost five months now.  We didn’t use any existing ETL process tool on the market, instead we develop our own.

After I studied the source code of Rhino ETL, I believe that it is possible to write our own ETL process. I didn’t use Rhino ETL, instead, my team develop our own framework that suit our need. The key of what I learn about Rhino ETL is that I can yield result from a database connection. This allow me to save memory usage. Instead of loading a list of data, I yield only the data I need to be process. Of course, I still need to load reference data during the transformation process.

Anyways, our application has over 150 tables, and the data warehouse has about 100 tables so far. There will probably be more data warehouse tables need to be added in the future.  We are at the final phrase of the first data warehouse project release. My team has been doing performance and manual testing while I am developing a integration test framework. We have unit test that test the data transformation and other parts of the ETL process, but we don’t have an integration which I believe it is necessary if we want to have a robust product.

Long story short, I finally finish coding the basic integration testing framework. I am using NDBUnit part my my framework.

Here is the basic attributes of ETL Process:
1. It uses ParallelLinq to process the transformation
2. It chunk the process by date range to reduce memory usage
3. It clean up old data from the live system once ETL is completed

Here is the basic attributes of the Integration Test
1. It loads the entire live database to a Dataset before the ETL process
2. It loads the entire live database to a Dataset after the ETL process
3. It loads the entire data warehouse database to a Dataset after ETL process

The data can be either an XML file, or it can be an existing database.

Other than the fact that my Visual Studio Dataset design almost die because of the amount of table that I have, the testing was fine until I use an customer database for testing, and this is what happened:

Reason Why I need a better machine

Good thing I was running Windows 7 64 bit which allow my applications allocate more memory than my other developers’ 32 OS. However, the test in the end took 1 hour to run and fail to perform clean up due to SQL connection was disposed error.

The problem was that I was too optimistic about loading data into memory for comparing data. To resolve this issue, I now have to explicitly load and unload data when doing comparison.

In conclusion, memory and performance issues don’t just exist in your application, but even in your tests.

Feb 13, 2011

How to evolve to Sprint Planning in Scrum (Agile Development)

Approx. 2 years ago, my development team began to switch from the traditional Waterfall model to the agile development. In our case, we choice Scrum as our ideal development model.  Switching to agile isn’t an one night process. It took a good year to get everyone comfortable with iterative development.

Since there are already many resources about agile development, I am not going to write in details on what agile development is. Instead, I would like to write about how our development planning process evolves from the waterfall model to the agile sprint planning model.

The following software planning process are listed in chronologic order.

1. Team Lead Prepares the specifications
In this model, developers (incl. myself) were simply given a feature specification document, in return developers provided rough estimate on how long the task may take to complete the projects. Of course, the estimated time had to be somewhat reasonable.  The planning meeting usually involves just the team lead and the assigned developer(s).  In the planning, the general strategy of how resolve the problem is discussed. The project time estimate could range from 2 weeks to a month. The actual length of time to accomplish a task can vary from a couple of day off from the estimate to more than a month.

2. Large tasks are Broken Down
Instead of having a high level specifications overview, developers were given smaller tasks that needs to be accomplished. Often developers may not see the “big picture” of the reason of completing the assignments. The team lead provides his estimated time on how long it should take the developer to accomplish the tasks.

3. Effort Points
The development team learns about Effort Points or Story Points. This was the most challenging part of going into agile development. How do you estimate the effort points. The thing about effort point and time estimate is that given a task and its description, developers should agree upon the same effort point regardless of how fast they code.  An experienced developer may need one day of work to accomplish a task with one effort point. A newbie may need two days of work to accomplish the same task.

Notice my team choice to talk about task, instead of a user story. The team lead estimate the  story point by himself.

4. A Standard Task for 1 Effort Point
It was hard to estimated effort point per task when there are no standard. We needed to come up with a standard of unit. In our product, a typical feature involves creating a new database table definition, add new server side code, add new client code, update the data model, and create a new UI. This set of sub task was used to represent 1 effort point.

5. Small Team instead of Large Team
The development team were broken down into 3 teams. The original team originally has 12 developers. It was broken down such that each team is about 3 to 4 developers focusing on different aspects of the enterprise product suite.

The original team lead now becomes the master team lead of 3 sub team leads.

6. Effort Point and Sprint Planning
In our sprint planning meeting, all developers in the team have to come up with an estimated effort points for a given tasks. Everyone had to agree on the same amount of effort points. To do this, we use planning poker.  Whenever there is a discrepancy of the estimated effort points among the developers, the developers who gave the highest and the lowest effort points would need to explain to the team why they chose the selected card. Doing this has three advantages:

  • It allows the developer to express their opinion, or at least force them to speak up.
  • It allows exchange of knowledge within the team
  • It allows everyone to see other developers’ expectations

After the two developers explain their concerns, everyone get to revolute the efforts and come up with a new estimate. Hopefully everyone will come to the same effort point.

It is actually fun to play planning porker.

Planning Poker

Remember, there are no shame disagree other developer estimates. If you truly believe your estimate is correct, you can keep putting a different number than the other does. However, usually majority rules at the end because you are only given a limit amount of time to do the estimate.

7. Two Weeks Sprint + Demo
Instead of doing planning irregularly, the teams had  two week sprints. A sprint’s duration is two weeks. The first day is the sprint planning, and the last day is the sprint demo. That’s right, our developers need to demo what they had done to an audience on what they did during the sprint. The good news for us is that we only demo to the internal people and not customers from outside the company. If the demo crashes, it is not the end of the world. The demo has 3 advantages:

  • Ensure developers show us what he/she promise to deliver
  • Developers need to test their code well enough so that it works in the demo
  • It allows developers to showcase what they have accomplished.

8. Making The Sprint More Efficient
Instead of having all three teams running on the same sprint cycles, it was broken down so that each time do the planning and do the demo on different day of the week. This way the attendees do not have to watch all three demos on the same day.

9. Need Better Tools
Instead of having everything on paper, we start using TFS and its planning tool. The tool keep track on user story, the tasks, the estimated time and the remaining team. This is how the leads keep track if the developers are on track on completing the task.
At this point developers no longer do effort point estimate, instead, we give our estimated time.

10. More Tool
TFS sucks in many usability aspect, as a result we got Urban Turtle to help us.

11. We Need Feedback
With the introduction of Burn Down chart, we now have constantly feedback on how close are we completing our task in time.

burndown

12. Dealing with Unknown Factors
I was involved in a project that has many unknowns. In the first 8 to 10 sprints the team under estimated the tasks. Many issues where discovered during the development and were not foreseen in the sprint planning. These new issues either needs to be fix in order to unblock the developers, or they can be push to the next sprint so that they wouldn’t disrupt the current sprint.
After the first  8 to 10 sprints, the  team learned to be pessimistic on their time estimate. It is actually better to over-estimate, so that you are able to complete your committed tasks.

Conclusion

My company’s development methodology continues to evolve. There are many advantages when using agile development, while there are pitfalls we need to be careful of. Here are positive things and pitfalls that I saw in our development model.

Positives:

  • Everyone is committed to accomplish the assigned task by the deadline
  • Any issues that were discovered during development can properly be addressed in the sprint planning meeting, instead of allowing the developers tackle the problem without meeting with others.
  • It is flexible, because every two weeks we have a chance revaluate our task priorities.
  • If anyone need to take a sick day, vacation, or leaves the company, it has less impact to the entire projects because the user story is broken down into tasks. A task has to be accomplish at minimum in 2 weeks. If the task takes more two weeks, it needs to be broken down further.

Pitfalls:

  • It is temping to under estimate because of peer pressure or their subconscious competitiveness against other developers.
  • It is easy to schedule customer’s requested user stories, and forgot developers may want to do their own projects such as writing more unit tests, develop new tools, optimize or improve existing feature quality, preparing training session, which helps improve the development process.

To avoid under estimate the project time, the burn down chart and the retro respective helps the team to recalibrate their thought on the estimate, so that they can do better estimate for the next sprint.

The team can add personal project as a task so that developers get the chance to do their own stuffs.