Digital assets have a life cycle just like plants and animals. Your organization may acquire, create or even re-purpose digital assets which are:
a major milestone in the organization’s history
historical (good or bad)
required by law or other regulations
repeatedly used or requested
What are the signs that digital asset is coming to the end of its life cycle?
Is the asset:
Dated (is everyone wearing bell bottoms, plaid and have long sideburns? This could be a sign unless you are covering 1970 revivals)
Too generic? Or is it too specific to be of interest to your audience?
Antiquated (do you hear dinosaurs roaming or modems dialing for a connection? Or is an abacus the most advanced technology available?)
Not downloaded/ordered/requested/used in the past 5 years. (A DAM solution should be able to report the number of times an asset has been downloaded/ordered/requested/used. This is a major indicator.)
If the case is the asset has not been downloaded/ordered/requested/used in over 5 years, it may be time to migrate the asset to an archive outside of the DAM. Or maybe just keep a proxy of the asset in the DAM along with a location/contact for the archive that has a high quality copy of the digital asset. Just keep in mind not to delete all copies of those legacy digital asset (you know, those assets supposedly at the end of their life cycle), unless required by law or regulation.
How many pieces of documentation do you keep for your assets? (This would explain the history and ownership of your digital assets, sometimes known as provenance)
How will this content be changed over the life cycle of the asset?
How many people do you have accessing this information? (History of the asset as seen from the DAM reports)
What is the total value of your asset related projects per year? (Do you keep in mind the value of these projects and assets?)
How many people can you afford to have managing this documentation? (This is documented in writing by someone, right?)
What happens when someone uses the wrong information? (Do you like errors and inconsistency?)
Is any of this a matter of proper version control? (Do you really want to know how many systems fail to have this today? Too many.)
If the digital asset is needed ‘forever,’ consider what file format it is kept in, since this format may not be supported ‘forever.’ These assets may need to be converted into different (more current) file formats. Refer to Another DAM podcast interview with Linda Tadic who speaks about this as well as the time frame to revisit these things.
Consider the same with physical media storage such the evolution of audio from wax cylinders to LPs (33.3 and 45) to 8 tracks to audio tapes to CDs… now on hard drives and portable MP3 players.
How many digital assets do you actively use today?
If you wanted to archive this music, what format do we store this for high quality, everyday listening, or did we all keep all formats/players? Not likely. While some purist may continue to play these antique tools of the past purely for nostalgic reasons, most of these formats do not keep these physically degrading/fading media formats. Instead, we keep the highest quality digital copy of important audio we need to keep. The ones of value. The digital asset.
In less than 200 years, photography went from wet plates to large format to medium format to 35 mm celluloid film then to eventually digital capture. Digital camera sales have surpassed film camera sales for several years now. Some companies are re-purposing their photography infrastructure for something the market actually wants to buy today. Most photographers want to see their images now. Sometimes, even their subjects want to see their image now as well.
When it came to film photography, the photographer often used to hand off the (analog) film for processing and printing to a lab. With digital photography, the photographer often is the lab as well.
Like most digital assets…
We want them now. We may even need them now. For the projects at hand. For the time being.
With the advent of more technology to help add metadata to digital assets, it would good to review a few tagging options available aside from what may already be done within an organization. Some DAM systems do not make it easy to apply a controlled vocabulary, taxonomy or any list for users to pick from when it comes to applying tags in the process. Keep in mind tags are just one form (and often one field) of metadata out of many possible options.
What is tagging?
We are not talking about vandalizing walls nor subway trains with ‘artwork’. The act of applying tags (keywords or key phrases) is tagging. We are however making a mark in an organization or community by making its digital assets more searchable, more findable (within finite results) and possibly better monetized. If you can search for digital assets, you should find the relevant digital assets you need and these digital assets could easily be distributed if this happens. If a client can not find the digital asset they need, they can not buy/license/use this digital asset. A number of photo agencies have found this out the hard way after sometime, but this effort extends to all media including audio, video, text, graphics and photos. A few large sporting organizations have massive archives of their sports history waiting to be tagged. How could this be done for them as well as your organization?
What is auto-tagging?
Auto-tagging is tagging (adding metadata) in an automated fashion via computer with complex algorithms. Often, these algorithms work by analyzing the content (often visual images) to match shapes and patterns such as faces.
More tools now have facial recognition based on the position of an eye, nose and mouth. Advanced facial recognition also looks at the forehead, cheeks, chin and sometimes ears. If you apply the name of someone to some images of this person, the software will do the rest with reasonable accuracy. Just make sure you do not smile. Kind of like with your passport and driver’s license. I dare you to smile at airport customs or the motor vehicle administrator’s office while waiting in line and see how long that smile stays on your face. We know that smile will not stay long on anyone’s face unless you want to be profiled.
Beyond faces, common shapes and patterns yield mixed results dependent on the image content, quality of the image, resolution and focus.
At a meetup, I spoke with someone who works for a company which offered auto-tagging. A few large social networks may be using these services as well.
I have reviewed some DAM and MAM systems with similar auto-tagging tools, but I was not amazed with the results (yet). When auto-tagging was used on images, results came back as trees for a photograph of grass (both green and vertical, but not close enough). When a photo of strawberry was auto-tagged, the results returned with cherry (both red, round-ish and fruit, but the texture is visibly different between the two).
One service I did see was auto-tagging for video which did quite well. I was asked to review this tool. As a test to have them prove themselves, I sent them an early silent film posted online and some music videos to see what the tool could do. The quality issues of the silent film as well as the abstract nature of the music video would be a challenge. The test yielded very good results based on the tool analyzing what patterns and shapes could be found frame by frame. If the pattern appeared within a number of video frames within a given period of time, the tool produced tags for this pattern.
Crowdsourcing work done for you
At a recent lecture, I listened to a few experts explaining some new services where there are some mechanical turks (people doing repetitive micro tasks remotely) doing some tagging. There now some new players on the field of crowdsourcing metadata. A few of these services are very big, while most are still small. Many have big potential.
Many of these services are cloud based are now, while a few are in-house installs which could be integrated with other systems. Most of these groups are using global resources. Some of these services are gamified just like the early beginning of some university projects to help a community tag their digital assets and get a high score as another personal bonus.
There are some news reports about these services as well as word of mouth within some communities of some archivist groups, digital asset management groups, humanities groups, information management groups, librarians and metadata management groups.
Micro-tasks for micropayments with error checking process
These crowdsourced micro-tasks (tag a few images) are often paying individuals who are nationally or internationally distributed around the globe just a few pennies per task (tag an image with N number of keywords or key phrases based on a controlled vocabulary). The question arises why would anyone really care to apply relevant tags in an accurate manner for payment of just a few pennies per image? You have multiple people doing the same task completely independent of each other. With a nice automated process, if the tags appear multiple times in three to five people’s results per task, those tags are likely relevant and accurate. Each individual should not see the results of the next person. For example, if five different people in five different geographic areas, using five different IP addresses, at five different times during the day enter matching tags, these tags is likely accurate and verified. The client should likely review the results to see how relevant these are for the organization’s purposes when the results are delivered.
Audio transcription and speech-to-text
There are several transcription services which still use humans transcribe audio files into text which are crowdsourced services. With even smartphones, speech-to-text technology is getting much better because some technologies are “learning” based speech patterns when a word is spoken (including with accents) and the acceptance rate of users continue to learn with what they said to their mobile device and the usefulness of the result. It can be a challenge for a machine to transcribe spoken audio into text while music is playing, cars are driving by and other ambient noise is audible in the background. Speech to text technology has gotten better, but is still subject to ambient noise which can give it a higher error rate. While we take it for granted, noise canceling microphones can filter out a lot of unwanted ambient noise to get spoken words into an audible sound file sent out to be processed in the cloud and this returns a text prompt or an audio prompt to the mobile device.
As seen in one demo of auto-tagged audio from a video, I would hardly call the re-purposing closed captioned text taken from a widely distributed blockbuster movie and turning it into searchable text as an astonishing feat. It is smart though. Of course, that is an easier solution since most of that work is already done and audio does not have to be transcribed again since it may come from the movie script (unless ad lib applies).
Photography
Some auto-tagging services for photos claim to be able to tag a million photos in 24-48 hours if you have an established taxonomy.
When we license stock photography, we could get most metadata embedded in the photograph once acquired but this varies based on the vendor, the age of the digitized image in the collection and its popularity. Much of the well-known stock photography is done by various keywording services with established taxonomies for consistency. Tagging of stock photography is still mostly done the “old school” method by humans, even if they crowdsourced. The reason they remain old school (at least for 2012) is the error-rate for some artificial intelligence is still too high. Fixing the errors can take more time than a human tagging it in the first place. This may change over time as the technology matures.
Free crowdsourcing efforts
A large museum group has a few efforts where they are posting digital photographs on their website and asking the public to come visit, browse through the image collections and apply tags (and/or descriptions) to these individual images with the belief that this effort could aggregate the public’s time and knowledge. Results vary and so does consistency. The results should be able to cull through the tags and weed out the “meta crap” from relevant metadata. This still requires time.
I will point out that even some of the most advanced (publicly released) artificial intelligence relies on humans to check and tune the accuracy of its algorithms. Even Watson was taking a text prompted clue only while the human competitors received a text and verbal prompted clue during a televised game show where humans competed against the machine. It does not take a genius to figure out who won based on “who” can process text faster with a high rate of accuracy, more memory and provided human engineers are close behind to tailor parameters so they can improve accuracy.
Crowdsourcing all of us
Captcha is a code you may be familiar with. Often seen as a security measure because so far only humans can decipher these “codes”. We are prompted to re-enter this code and this is verified by a number of other people to aggregate the correct answer of what the code says. A very large newspaper company was able to digitize about 150 years of their archive into searchable text using captcha technology when optical character recognition (OCR) failed to complete this digitization task. Why would OCR fail you ask? When printed text needs to digitized into searchable text, there are a number of challenges. Fonts change over years and some are no longer recognized. Printing quality and preservation are variables over long periods of time. That code may look like a captcha by default. Cool, dark, dry storage is not the case sometimes.
So back to the question of how could a sports organization tag millions of their photographs. I would recommend crowdsourcing fans to tag these photographs. You can post watermarked, lower resolution photographs (proxies) to see and have others tag them. Then, offer the fans a significant discount on prints if they tag a given number of digital assets.
Should I test these services?
There is no guarantee that any institutional knowledge nor necessarily any subject matter expertise will ‘automagically’ show up in your results for tags. Let us return to reality, clear out any smoke screen of unrealistic expectations and remember what is the source of these tags and what does the source know. Even if you are skeptical (and I am until a service can prove themselves usable), try before you buy. Give these services a fair tryout and analyse the end results you get back. Most of these services will give you a free trial, so take advantage of this for a reality check. Do not simply take my word for it. Do your own homework and judge for yourself after seeing if these are viable services for your own organization’s purposes. This is called due diligence on your part.
Let us know when you are ready for vendor neutral consulting on Digital Asset Management and assistance with tagging.
After reading one of my most popular blog posts, a few readers have asked “What does a Digital Asset Manager need to know?”
This is assuming an organization realizes why a Digital Asset Manager is needed who is skilled and experienced in the field.
That said, they will need to know how to work with the following:
People
Be helpful. You should there to help the people, the process, the technology and the information work together. No small feat in many cases nor a temporary effort.
Be resourceful.
Be honest. Brutally honest if needed. Do not hold back much. The truth may require revealing news people do not want to hear, but rather need to hear (if you have read my blog or know me well enough, you will know what I mean).
Be patient. Not everyone will be technical nor understand what is involved.
Listen. To your users. All of them. Not just to yourself talking and repeating yourself.
Be specific. Do not assume people know, even the obvious. Remember, not everyone is technical.
Simplify. Do not overcomplicate unless you like confusion, fixing errors and having delays.
Be an agent of change. Change, not because it is shiny/new/cool, but needed for increased effectiveness and efficiency across the organization.
Know who is responsible for what. If you are not in charge of something, who is? If no one is in charge, take charge. “Initiative isn’t given, you take it”…along with responsibility.
Speak up. Interject as needed. Do not ‘wait your turn’ or your points will be overlooked. Leave your emotions elsewhere. This is business.
Be accountable and hold others accountable for their actions (or lack thereof) when it comes to the DAM and everything else in your purview. It is a ‘two-way street’ whether we realize it or not. Top to bottom and back.
Be proactive as well as reactive as needed. You should not be ‘fire fighting’ issues all day, every day (otherwise, there is a prioritization and process issue).
How and when to say “No.” Contrary to some people’s belief, ‘yes men‘ can hurt the organization as well as themselves especially if a constant “yes” is believed to always be the right answer. It is not. Reality checks are necessary for all.
Do not kill yourself, physically nor mentally. Nor anyone else for that matter. Even if it starts to sound really tempting. Really.
Process
There is at least one process, right? And it is followed?
How do DAM users interact with the Digital Asset Management process and system?
Help establish a process, test the process in the real world, document the process in writing and train users on the process/workflow as needed (especially when lacking). Work one-on-one or with small groups. Why? Large groups and committees are like large ships…they are harder to steer in any direction and slower to start, stop or react in general. Don’t believe me? Try it. Find out yourself.
How does metadata entry occur from sources (owned internally and/or externally) to normalization of the data to entry into the DAM. Then, track the process all the way through to use within system to yield the requested search results.
Manage by assigning, measuring and prioritizing daily. Of what you ask?
There is plenty more to assign, measure and prioritize…
Establish a process of user adoption from the beginning of the selection process of a DAM system to the integration of other systems to the regular operations of the solution. What are you doing to encourage your users?
How to make coffee (or tea) without spilling it nor burning yourself. (Like most things, carefully.)
Technology
Digital Asset Management solution within your organization
Metadata validation and when applicable, metadata automation
How to use and apply the LAMP solution stack (in case you thought there was nothing else to learn to improve your skills)
Java (the programming language as well as the coffee)
Information
Love information and data. Really. It may not love you back, but it is a give and take relationship. You get what you put into it, along with compounding value over time. Of course, I am talking about metadata. You should be one of the information experts within your organization.
Know what is available (and what is not), where it lives, how to get to it, how report on it, how to filter it and analyze it.Explain it. Train people on how to take ownership of it in their role, how to complete their part (metadata), the value of this information and why.
Know the difference between data, information and knowledge.
If you want a baseline to know how mature your DAM solution is now within your organization, start studying the DAM Maturity Model (DAM3), which was based on ECM3 as it continues to mature. Using DAM3, you can plot how mature your DAM solution is within organization today as well as where it could improve.
I write this as I leave my position where I was Digital Asset Manager for over 5 years. I have accepted another position as a Digital Asset Management professional in a different capacity to assist other organizations with DAM.
If you need vendor neutral assistance or advice on Digital Asset Management, let me know.