With the advent of more technology to help add metadata to digital assets, it would good to review a few tagging options available aside from what may already be done within an organization. Some DAM systems do not make it easy to apply a controlled vocabulary, taxonomy or any list for users to pick from when it comes to applying tags in the process. Keep in mind tags are just one form (and often one field) of metadata out of many possible options.
What is tagging?
The act of applying tags (keywords or key phrases) is tagging. We are not talking about vandalizing walls nor subway trains with ‘artwork’. We are however making a mark in an organization or community by making its digital assets more searchable, more findable (within finite results) and possibly better monetized. If you can search for digital assets, you should find the relevant digital assets you need and these digital assets could easily be distributed if this happens. If a client can not find the digital asset they need, they can not buy/license/use this digital asset. A number of photo agencies have found this out the hard way after sometime, but this effort extends to all media including audio, video, text, graphics and photos. A few large sporting organizations have massive archives of their sports history waiting to be tagged. How could this be done for them as well as your organization?
What is auto-tagging?
Auto-tagging is tagging (adding metadata) in an automated fashion via computer with complex algorithms. Often, these algorithms work by analyzing the content (often visual images) to match shapes and patterns such as faces.
More tools now have facial recognition based on the position of an eye, nose and mouth. Advanced facial recognition also looks at the forehead, cheeks, chin and sometimes ears. If you apply the name of someone to some images of this person, the software will do the rest with reasonable accuracy. Just make sure you do not smile. Kind of like with your passport and driver’s license. I dare you to smile at airport customs or the motor vehicle administrator’s office while waiting in line and see how long that smile stays on your face. We know that smile will not stay long on anyone’s face unless you want to be profiled.
Beyond faces, common shapes and patterns yield mixed results dependent on the image content, quality of the image, resolution and focus.
At a meetup, I spoke with someone who works for a company which offered auto-tagging. A few large social networks may be using these services as well.
I have reviewed some DAM and MAM systems with similar auto-tagging tools, but I was not amazed with the results (yet). When auto-tagging was used on images, results came back as trees for a photograph of grass (both green and vertical, but not close enough). When a photo of strawberry was auto-tagged, the results returned with cherry (both red, round and fruit, but the texture is visibly different between the two).
One service I did see was auto-tagging for video which did quite well. I was asked to review this tool. As a test to have them prove themselves, I sent them an early silent film posted online and some music videos to see what the tool could do. The quality issues of the silent film as well as the abstract nature of the music video would be a challenge. The test yielded very good results based on the tool analyzing what patterns and shapes could be found frame by frame. If the pattern appeared within a number of video frames within a given period of time, the tool produced tags for this pattern.
Crowd sourcing work done for you
At a recent lecture, I listened to a few experts explaining some new services where there are some mechanical turks (people doing repetitive micro tasks remotely) doing some tagging. There now some new players on the field of crowd sourcing metadata. A few of these services are very big, while most are still small. Many have big potential.
Many of these services are cloud based are now, while a few are in-house installs which could be integrated with other systems. Most of these groups are using global resources. Some of these services are gamified just like the early beginning of some university projects to help a community tag their digital assets and get a high score as another personal bonus.
There are some news reports about these services as well as word of mouth within some communities of some archivist groups, digital asset management groups, humanities groups, information management groups, librarians and metadata management groups.
Micro tasks for micro payments with error checking process
These crowd sourced micro tasks (tag a few images) are often paying individuals who are nationally or internationally distributed around the globe just a few pennies per task (tag an image with N number of keywords or key phrases based on a controlled vocabulary). The question arises why would anyone really care to apply relevant tags in an accurate manner for payment of just a few pennies per image? You have multiple people doing the same task completely independent of each other. With a nice automated process, if the tags appear multiple times in three to five people’s results per task, those tags are likely relevant and accurate. Each individual should not see the results of the next person. For example, if five different people in five different geographic areas, using five different IP addresses, at five different times during the day enter matching tags, these tags is likely accurate and verified. The client should likely review the results to see how relevant these are for the organization’s purposes when the results are delivered.
Audio transcription and speech-to-text
There are several transcription services which still use humans transcribe audio files into text which are crowd sourced services. With the latest smart phones, speech-to-text technology is getting much better because some technologies are “learning” based speech patterns when a word is spoken (including with accents) and the acceptance rate of users to continue with what they said to their mobile device. While we take it for granted, noise canceling microphones can filter out a lot of this unwanted ambient noise to get spoken words into an audible sound file sent out to be processed in the cloud and this returns a text prompt or an audio prompt to the mobile device.
As seen in one demo of auto-tagged audio from a video, I would hardly call the re-purposing closed captioned text taken from a widely distributed blockbuster video and turning it into searchable text an astonishing feat. Of course, that is an easier solution since most of that work is already done and audio does not have to be transcribed again. It can be challenge for a machine to transcribe spoken audio into text while music is playing, cars are driving by and other ambient noise is audible in the background.
Free crowd sourcing efforts
A large museum group has a few efforts where they are posting digital photographs on their website and asking the public to come visit, browse through the image collections and apply tags (and/or descriptions) to these individual images with the belief that this effort could aggregate the public’s time and knowledge. Results vary. The results should be able to cull through the tags and weed out the “meta crap” from the valid metadata. This still requires time.
Does this replace humans?
The great fear I keep hearing from some individuals is “when will machines replace us?” At the time I write this post, we are far from that point. We do however rely more on machines to assist us every day.
I will point out that even some of the most advanced (publicly released) artificial intelligence relies on humans to check and tune the accuracy of its algorithms. Even Watson was taking text prompted clues while the human competitors received a text and verbal prompted clue during a televised game show where humans competed against the machine. It does not take a genius to figure out who won based on “who” can process a text prompt faster with a high rate of accuracy, provided human engineers are close behind to tailor parameters for improved accuracy.
Crowd sourcing all of us
Captcha is a code you may be familiar with. Often seen as a security measure because so far only humans can decipher these “codes”. We are prompted to re-enter this code and this is verified by a number of other people to aggregate the correct answer of what the code says. A very large newspaper company was able to digitize about 150 years of their archive into searchable text using captcha technology when optical character recognition (OCR) failed to complete this digitization task. Why would OCR fail you ask? When printed text needs to digitized into searchable text, there are a number of challenges. Fonts change over years and some are no longer recognized. Printing quality and preservation are variables over long periods of time. That code may look like a captcha by default. Cool, dark, dry storage is not the case some times.
So back to the question of how could a sports organization tag millions of their photographs. I would recommend crowd sourcing fans to tag these photographs. You can post watermarked lower resolution proxies to see and tag them. Then, offer the fans a significant discount on prints if they tag a given number of digital assets.
Should I test these services?
There is no guarantee that any institutional knowledge nor necessarily any subject matter expertise will ‘automagically’ show up in your results for tags. Let us return to reality, clear out any smoke screen of unrealistic expectations and remember what is the source of these tags and what does the source know. Even if you are skeptical (and I am until a service can prove themselves usable), try before you buy. Give these services a fair tryout and analyse the end results you get back. Most of these services will give you a free trial, so take advantage of this for a reality check. Do not simply take my word for it. Do your own homework and judge for yourself after seeing if these are viable services for your own organization’s purposes. This is called due diligence on your part.
are still OCR for doc
Autotagging for photos claim to be able to metatag a million photos in 24-48 hours if you have an established taxonomy. I get most metadata for photography from the photo agency we license or acquire them from.
When a photo shoot is done for an organization, there is nothing stopping the organization requiring the photographer (per their contract) to apply UPDIG.
So, I asked.
I was reading a recent article
speech to text has gotten better, but is still subject to ambient noise (high error rate).
Image recognition has gotten better over the years, but if you smile or laugh, facial recognition often fails. Facial recognition software is included in some popular image viewers and is an option on very few DAM systems.
Facial recognition is used at airport security as well. Then again, who smiles in long, airport security lines? Sounds convenient for security to have long lines. If you did smile, you may be randomly subjected to search at airport security…because the facial recognition failed.