With the advent of more technology to help add metadata to digital assets, it would good to review a few tagging options available aside from what may already be done within an organization. Some DAM systems do not make it easy to apply a controlled vocabulary, taxonomy or any list for users to pick from when it comes to applying tags in the process. Keep in mind tags are just one form (and often one field) of metadata out of many possible options.
What is tagging?
We are not talking about vandalizing walls nor subway trains with ‘artwork’. The act of applying tags (keywords or key phrases) is tagging. We are however making a mark in an organization or community by making its digital assets more searchable, more findable (within finite results) and possibly better monetized. If you can search for digital assets, you should find the relevant digital assets you need and these digital assets could easily be distributed if this happens. If a client can not find the digital asset they need, they can not buy/license/use this digital asset. A number of photo agencies have found this out the hard way after sometime, but this effort extends to all media including audio, video, text, graphics and photos. A few large sporting organizations have massive archives of their sports history waiting to be tagged. How could this be done for them as well as your organization?
What is auto-tagging?
Auto-tagging is tagging (adding metadata) in an automated fashion via computer with complex algorithms. Often, these algorithms work by analyzing the content (often visual images) to match shapes and patterns such as faces.
More tools now have facial recognition based on the position of an eye, nose and mouth. Advanced facial recognition also looks at the forehead, cheeks, chin and sometimes ears. If you apply the name of someone to some images of this person, the software will do the rest with reasonable accuracy. Just make sure you do not smile. Kind of like with your passport and driver’s license. I dare you to smile at airport customs or the motor vehicle administrator’s office while waiting in line and see how long that smile stays on your face. We know that smile will not stay long on anyone’s face unless you want to be profiled.
Beyond faces, common shapes and patterns yield mixed results dependent on the image content, quality of the image, resolution and focus.
At a meetup, I spoke with someone who works for a company which offered auto-tagging. A few large social networks may be using these services as well.
I have reviewed some DAM and MAM systems with similar auto-tagging tools, but I was not amazed with the results (yet). When auto-tagging was used on images, results came back as trees for a photograph of grass (both green and vertical, but not close enough). When a photo of strawberry was auto-tagged, the results returned with cherry (both red, round-ish and fruit, but the texture is visibly different between the two).
One service I did see was auto-tagging for video which did quite well. I was asked to review this tool. As a test to have them prove themselves, I sent them an early silent film posted online and some music videos to see what the tool could do. The quality issues of the silent film as well as the abstract nature of the music video would be a challenge. The test yielded very good results based on the tool analyzing what patterns and shapes could be found frame by frame. If the pattern appeared within a number of video frames within a given period of time, the tool produced tags for this pattern.
Crowdsourcing work done for you
At a recent lecture, I listened to a few experts explaining some new services where there are some mechanical turks (people doing repetitive micro tasks remotely) doing some tagging. There now some new players on the field of crowdsourcing metadata. A few of these services are very big, while most are still small. Many have big potential.
Many of these services are cloud based are now, while a few are in-house installs which could be integrated with other systems. Most of these groups are using global resources. Some of these services are gamified just like the early beginning of some university projects to help a community tag their digital assets and get a high score as another personal bonus.
There are some news reports about these services as well as word of mouth within some communities of some archivist groups, digital asset management groups, humanities groups, information management groups, librarians and metadata management groups.
Micro-tasks for micropayments with error checking process
These crowdsourced micro-tasks (tag a few images) are often paying individuals who are nationally or internationally distributed around the globe just a few pennies per task (tag an image with N number of keywords or key phrases based on a controlled vocabulary). The question arises why would anyone really care to apply relevant tags in an accurate manner for payment of just a few pennies per image? You have multiple people doing the same task completely independent of each other. With a nice automated process, if the tags appear multiple times in three to five people’s results per task, those tags are likely relevant and accurate. Each individual should not see the results of the next person. For example, if five different people in five different geographic areas, using five different IP addresses, at five different times during the day enter matching tags, these tags is likely accurate and verified. The client should likely review the results to see how relevant these are for the organization’s purposes when the results are delivered.
Audio transcription and speech-to-text
There are several transcription services which still use humans transcribe audio files into text which are crowdsourced services. With even smartphones, speech-to-text technology is getting much better because some technologies are “learning” based speech patterns when a word is spoken (including with accents) and the acceptance rate of users continue to learn with what they said to their mobile device and the usefulness of the result. It can be a challenge for a machine to transcribe spoken audio into text while music is playing, cars are driving by and other ambient noise is audible in the background. Speech to text technology has gotten better, but is still subject to ambient noise which can give it a higher error rate. While we take it for granted, noise canceling microphones can filter out a lot of unwanted ambient noise to get spoken words into an audible sound file sent out to be processed in the cloud and this returns a text prompt or an audio prompt to the mobile device.
As seen in one demo of auto-tagged audio from a video, I would hardly call the re-purposing closed captioned text taken from a widely distributed blockbuster movie and turning it into searchable text as an astonishing feat. It is smart though. Of course, that is an easier solution since most of that work is already done and audio does not have to be transcribed again since it may come from the movie script (unless ad lib applies).
Some auto-tagging services for photos claim to be able to tag a million photos in 24-48 hours if you have an established taxonomy.
When we license stock photography, we could get most metadata embedded in the photograph once acquired but this varies based on the vendor, the age of the digitized image in the collection and its popularity. Much of the well-known stock photography is done by various keywording services with established taxonomies for consistency. Tagging of stock photography is still mostly done the “old school” method by humans, even if they crowdsourced. The reason they remain old school (at least for 2012) is the error-rate for some artificial intelligence is still too high. Fixing the errors can take more time than a human tagging it in the first place. This may change over time as the technology matures.
Free crowdsourcing efforts
A large museum group has a few efforts where they are posting digital photographs on their website and asking the public to come visit, browse through the image collections and apply tags (and/or descriptions) to these individual images with the belief that this effort could aggregate the public’s time and knowledge. Results vary and so does consistency. The results should be able to cull through the tags and weed out the “meta crap” from relevant metadata. This still requires time.
Does this replace humans?
The ‘great fear’ I keep hearing from some individuals is “when will machines replace us?” At the time I write this post, we are far from that point. We do however rely more on machines to assist us every day.
I will point out that even some of the most advanced (publicly released) artificial intelligence relies on humans to check and tune the accuracy of its algorithms. Even Watson was taking a text prompted clue only while the human competitors received a text and verbal prompted clue during a televised game show where humans competed against the machine. It does not take a genius to figure out who won based on “who” can process text faster with a high rate of accuracy, more memory and provided human engineers are close behind to tailor parameters so they can improve accuracy.
Crowdsourcing all of us
Captcha is a code you may be familiar with. Often seen as a security measure because so far only humans can decipher these “codes”. We are prompted to re-enter this code and this is verified by a number of other people to aggregate the correct answer of what the code says. A very large newspaper company was able to digitize about 150 years of their archive into searchable text using captcha technology when optical character recognition (OCR) failed to complete this digitization task. Why would OCR fail you ask? When printed text needs to digitized into searchable text, there are a number of challenges. Fonts change over years and some are no longer recognized. Printing quality and preservation are variables over long periods of time. That code may look like a captcha by default. Cool, dark, dry storage is not the case sometimes.
So back to the question of how could a sports organization tag millions of their photographs. I would recommend crowdsourcing fans to tag these photographs. You can post watermarked, lower resolution photographs (proxies) to see and have others tag them. Then, offer the fans a significant discount on prints if they tag a given number of digital assets.
Should I test these services?
There is no guarantee that any institutional knowledge nor necessarily any subject matter expertise will ‘automagically’ show up in your results for tags. Let us return to reality, clear out any smoke screen of unrealistic expectations and remember what is the source of these tags and what does the source know. Even if you are skeptical (and I am until a service can prove themselves usable), try before you buy. Give these services a fair tryout and analyse the end results you get back. Most of these services will give you a free trial, so take advantage of this for a reality check. Do not simply take my word for it. Do your own homework and judge for yourself after seeing if these are viable services for your own organization’s purposes. This is called due diligence on your part.
Let us know when you are ready for vendor neutral consulting on Digital Asset Management and assistance with tagging.
February 13, 2013 at 4:51 PM
The popularity of this blog post inspired a NYC DAM meetup on the topic: http://www.meetup.com/NYCDigitalAssetManagers/events/104442962/
February 15, 2017 at 12:30 PM
Since this was written in 2012, check out the interview on this topic at tagging.tech