What is Problem Management?
Difference between incident and problem management Incident vs problem Error control Examples of problems
The goal of Problem Management is to minimise both the number and severity of incidents and problems in your school. It should aim to reduce the adverse impact of incidents and problems that are caused by errors within the ICT infrastructure, and to prevent recurrence of incidents related to these errors.
  • Problems should be addressed in priority order with attention paid to the resolution of problems that can cause serious disruption.
  • The degree of management and planning required is greater than that needed for incident control, where the objective is restoration of normal service as quickly as possible.
  • Problem Management's responsibility is to ensure that incident information is documented in such a way that it is readily available to all technical support staff.

Problem Management has reactive and proactive aspects
  • reactive - problem solving when one or more incidents occur
  • proactive - identifying and solving problems and known errors before incidents occur in the first place.

Problem Management includes
  • problem control, which includes advice on the best workaround available for that problem
  • error control.


Difference between incident and problem management
  • The aim of Incident Management is to restore the service to the user as quickly as possible, often through a workaround, rather than through trying to find a permanent solution.
  • Problem Management differs from Incident Management in that its main goal is the detection of the underlying causes of an incident and the best resolution and prevention.
  • In many situations the goals of Problem Management can be in direct conflict with the goals of Incident Management
  • Deciding which approach to take requires careful consideration. A sensible approach would be to restore the service as quickly as possible, but ensuring that all details are recorded. This will enable Problem Management to continue once a workaround had been implemented
  • Discipline is required as the thought that the incident is fixed will prevail, and the incident may well appear again if the resolution to the problem is not found.

Incident vs problem
An incident is where an error occurs, something doesn't work the way it is expected. This is often referred to as
  • a fault
  • error
  • it doesn't work!
  • a problem          
but the term used with FITS is incident.

A problem
  • can be the occurance of the same incident many times
  • it can be an incident that impacts many users
  • the result of network diagnostics revealing systems not operating the expected way.

Therefore a problem can exist without having immediate impact on the users.
Incidents are usually more visible and the impact on the user is more immediate.
Error control
Error control covers the processes involved in successful correction of known errors. The objective is to remove equipment with known errors that affects the IT infrastructure to prevent the recurrence of incidents.

Error control activities can be reactive and proactive.
Reactive activities include
  • identification of known errors through Incident Management
  • implementing a workaround.

Proactive activities include
  • finding a solution to a recurring problem
  • creating a solution
  • including the solution in the known errors database.

Examples of problems
Technical problems can exist without impact to the user. However, if they are not spotted and dealt with before an incident occurs, they can have a big impact on the availability of the computer service.

User experienced problems
  • The printer won't form-feed paper through the printer. The user has to advance the paper by using the form-feed button.
  • Each time a new user logs onto a computer, they have to reinstall the printer driver.
  • Windows applications crash intermittently without an error message. The computer will restart and work properly afterwards.

Technical problems
  • Disk space usage is erratic. Sometimes lots of disk space is available, but at other times not much is available. There is no obvious reason and no impact to the users - yet!
  • A network card is creating lots of unnecessary traffic on the network, which could eventually reduce the bandwidth available, leading to a slow response from network requests.