Sunday, April 22, 2007

Fail-fast vs. complete validation

Almost all applications work with data that come from the interaction with humans or other applications. These data, however, may not necessarily meet the requirements of the accepting applications. Data must be validated.

What is validation?

Data entered must pass a set of validation rules in order to be recognized as valid and allowed for further processing.

As an example, lets take a class that has three members: name:String, created:Date and total:int. Our application requires that name is set (not null) and has at least three characters; created is also required and must be a date representing time before now; and total must be a non-negative integer.

There are two common approaches to data validation: fail-fast validation and complete validation.

Fail-fast validation

How it works

If any of the validation rules fails, validation is stopped and data is pronounced invalid and rejected for further processing.

Output

Boolean result that indicates the validity of input data: true for valid, false for invalid.

Pros

It is generally faster than complete validation as first failure terminates the execution of consecutive validation rules. Does not have the over of failure cause reporting.

Cons

Does not provide enough information about the cause of failure.

When to use it

If a simple result: true or false is enough; detailed information about the cause of failure is not required. May be suitable for cases when the source of data cannot correct the data (usually a system without human input).

Example

boolean isValid(String name, Date created, int total)

Data is passed in and boolean result is returned.

Complete validation

How it works

Failure of a validation rule does not stop the validation process. Data is marked as invalid and rejected for further processing after completing whole validation process.

Output

Boolean result that indicates the validity of input data: true for valid, false for invalid. Some form of error collection that contains the information about the causes of validation failure.

For examples see ActionMessages class from Struts framework, Errors class from Spring framework or ErrorCollection class in Atlassian JIRA.

Pros

Provides information about the cause or causes of validation failure. This information can provide the necessary feedback for correcting the input data.

Cons

Slower than fail-fast validation as extra information about causes of validation failure are reported and full set of validation rules is executed independently on the validation result.

When to use it

When a complete set of failure causes is required. The causes of failure may provide hints to the user entering the data about how to correct the data.

Example

void validate(String name, Date created, int total, ErrorCollection errorCollection)

Data is passed in along with the error collection. Method does not have to return anything (void) as invalid data is indicated by the presence of errors in the error collection.

Conclusion

If the complex validation rule set can be broken down into separate validations per input field, these can be used in order to enhance the user experience (via JavaScript or AJAX) – they can provide a real-time feedback for the data being entered.

Also consider that some cases can involve several other input values in order to make a decision about validity of input data. Such case can for example be a single date value consisting on the values from three input fields (don't do this, it's not a really good way of entering dates)

or conditionally required fields, such as the text area in the next picture is only required to be filled in if "other" is selected.

Both types of validation serve their purpose. Which one you decide to use depends mostly on how much information about data being validated you really need in order to make a decision or to correct it in case of failure.

6 comments:

keesun said...

thanks for easy description. I found one misspel.

Both types of validations server their purpose. => Both types of validations serve their purpose.

Can I translate this post in KOREAN and post to my blog?

keesun said...

I want to add an example of complete validation. Spring Framework's Errors class. :)

Dushan Hanuska said...

Keesun, thanks for your valuable input!

I have already corrected the typo and added the Spring's Errors class to the references.

I'm happy for you to translate my post into Korean and post it on your website as long as you make a reference to my original post and add a link to this post (see "Links to this post" underneath).

shevken said...

The problem is sometimes in real life, error messages,displays are determined by business requirements usually managed by biz analysts.

Nice clear description.

mario.gleichmann said...

first of all - thanks for your article about validation and your concise style of writing. it was a pleasure to read.
let me point to two statements that might need some more discussion:

the concept of 'fail fast':
this concept may be capable of being misunderstood, since 'fail fast' is still utilisied in a somewhat other context: means that your application won't silently go on when an exceptional state occurs with the risk to gradually corrupt your whole system. the system shouldn't 'mask' an exception and may go on with illegal or incomplete data, but fail immediately. otherwise chances are high that the system will crash at a later date (only loosely related or even completely unrelated to the origin of the failure), making it hard to find the cause of the corruptional, crashing state. instead your system should fail immediately so that it's fairly easy to point the cause of the failure. with respect to your validation example, both of your scenarios are candidates for 'fail fast', since both want go on the normal execution way but immediately inform about the invalid state.

2. you say, that 'complete validation' is slower. well, that's a question about the frame of reference: if you see the underlying action 'user input' as the frame of reference, things will swap into the opposite. if you inform immediately (fast) about the first occured invalid input data, than the user has only the chance to correct her input step by step. within each correction cycle, the user is only aware of one invalid input that is reported. should the user have 3 invalid dates, she have to go through 3 correction cycles, whereas with 'complete validation' all three invalid inputs are reported as once and can be corrected within one correction cycle. now guess what will take more time ... ;o)

greetings

mario

http://gleichmann.blog.com

Dushan Hanuska said...

Thanks Mario!

Re: point 2. If you look at my conclusion, you can see that I understand how annoying step-by-step correction of user input can become. Users need to enter valid data but do not want to wait for validation.

My solution to this problem is to make the user experience the best and at the same time ensure that the data entered is valid.

The way I would go about data validation would be to have each input field to trigger AJAX validation request (unless you can validate data on the client side via JavaScript - preferably, saving round trips to the server). Only upon form submission I would run the full scope validation. At this stage data coming in should be already valid, but we need to ensure the somebody did not somehow bypass pre-validation.

Cheers!


Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License.