![]() |
|
![]() |
![]() |
Robust MPEG Video Watermarking Technologies | ||||||||||
| . |
|
||||||||||
| Contents
Download: The whole document inclusive all pictures (827 KB) WORD 97 |
Mark Stabenau
GMD - German National Research Center for Information
Technology, Institute (IPSI)
Dolivostraße 15,
D-64293 Darmstadt, Germany
+49-6151-869-845
stabenau@darmstadt.gmd.de
Ralf
Steinmetz
Darmstadt University of Technology, Industrial Process
and System Communications
Merckstr. 25,
D-64283 Darmstadt, Germany
+49-6151-166151
Ralf.Steinmetz@KOM.tu-darmstadt.de
In this paper we propose and compare two watermarking
techniques for MPEG video with the intention to show the advantages and
the possible weakness in the schemes working in the frequency domain and
in the spatial domain. The main concept of the approaches is to provide
environments where digital videos can be signed by authors or producers
as their intellectual property to ensure and prove ownership rights on
the produced video material during its distribution. The most existing
systems are mainly used for still images, [1], [2], [3], [10], or the copyright
system SysCoP developed by [8]. The rarely existing video based technologies
mostly lack robust continuous signing of all video frames. The following
projects are related to digital video copyright protection, the Watermarking
DataBladeTM from NEC (1996) or the digital watermarking of MPEG-2
Coded Video in the Bitstream Domain from the University of Erlangen, Germany,[7].
Our work focuses on the implementation and evaluation of robust watermarking
technologies for MPEG video with the intention to embed watermarks in every
encoded video frame. In the first section we describe two existing watermarking
techniques for images and our main intentions to adapt these algorithms
to design a video based algorithm based on the existing schemes. We continue
with the description of the experimental system, our improvements and tests
results. Our tests mainly based on compression, format conversions and
StirMark attacks, [8]. Finally, we assess our achievements so far, and
provide an overview of further work.
Basically, watermarks, labels or codes embedded into multimedia data for enforcing a copyright must uniquely identify the data as property of the copyright holder, and must be difficult to be removed, even after various media transformation processes. Thus the goal of a label is to always remain present in the data. Today the existing labelling techniques have different security problems regarding robustness and visual artefacts, [2], [8]. In order to prevent any copyright forgery, misuse or violation, the key to the copyright labelling technique is to provide security and robustness of the embedded label against a variety of threats which include:
Following requirements are considered important for MPEG video watermarking:
During extraction process, the same coefficients are pseudo randomly selected using the secret key and the relationship between the coefficients are analysed. Depending on the relationship a 0 or 1 is extracted.
The algorithm does not need the original image for retrieval. An advantage is, that the watermark information is embedded in the compressed domain and can be easily applied to MPEG compressed video with minimal operations.
Despite these advantages the algorithm has a few shortcomings:
every block is modified and artefacts are common especially in smooth blocks
or in sharp edges. The algorithm is not robust against scaling or rotation
because the image dimension is used to generate a appropriate pseudo random
sequence. Our goal is to evaluate the behaviour with MPEG compression
and the coding in P- and B-frames. Our improvements address the visual
distortion mainly to keep the high quality of the video and to prevent
selective attacks on the watermark using efficient error correcting codes.
We use smooth block and edge recognition schemes to avoid artefacts.
The first main disadvantage is, that the retrieval process requires the original, un-watermarked image. For videos this is not acceptable because we would need the whole video to prove the watermark. Our algorithm fixes this shortcoming using plain statistical techniques to retrieve the label without the origin.
The second main disadvantage is that the watermarking
algorithm embeds only one information: the pattern created using a pseudo-random
number generator and a cellular automaton with voting rules. There is no
detailed information about the author or producer embedded. The retrieval
process provides only true or false if the pattern was retrieved successfully.
Our goal is to extend the algorithm in a way that we can embed code words
for detailed information like author name or address.
Figure 1: General Embedding Scheme
Before the watermarking starts, the MPEG video is traversed with the decoding and single frames (frame data) are produced. The information to be embedded (label data) is encrypted with a secret user key and then embedded into the image data with the same user key used as seed for a pseudo random number generation. The retrieval is performed with the inverse steps shown in the following figure:
Figure 2: General Retrieval Scheme
Before the retrieval starts, the video is traversed again
with the decoder and single frames are produced. In the next chapter we
describe the detailed embedding and retrieval steps separated into the
two used algorithms.
Figure 3: Improved Embedding Scheme
In the first step a position sequence is generated from
the user key as a seed with a secure random number generator. This is necessary
to hide the watermark in the frame. In the order of the generated position
sequence every block is now discrete cosine transformed. The second step
consists of the smooth block and edge detection as mentioned earlier. Although
sophisticated techniques were developed to check for HVS-characteristics
the calculation of the smoothness and the edge character of the block is
kept quite simple. This is due to the fact that in future versions this
calculation has to be done in the MPEG-stream domain and hopefully in real-time.
The parameter smooth is simple the number of DCT-coefficients which
are not zero after quantization with the quantization matrix Qm,
seen in the following matrix. Thus, high values of smooth indicate many
frequency components and therefore a great visual tolerance against additional
distortions through the watermark.
| low | 16 | 11 | 10 | 16 | 24 | 40 | 51 | 61 | |
| 12 | 12 | 14 | 19 | 26 | 58 | 60 | 55 | ||
| 14 | 13 | 16 | 24 | 40 | 57 | 69 | 56 | ||
| 14 | 17 | 22 | 29 | 51 | 87 | 80 | 62 | ||
| 18 | 22 | 37 | 56 | 68 | 109 | 103 | 77 | ||
| 24 | 35 | 55 | 64 | 81 | 104 | 113 | 92 | ||
| 49 | 64 | 78 | 87 | 103 | 121 | 120 | 101 | ||
| 72 | 92 | 95 | 98 | 112 | 100 | 103 | 99 | High |
The parameter edge is calculated as simple as smooth: edge is the sum of the absolute values of the DCT-coefficients 1, 2, 8, 9, 10, 16, 17 as marked in Qm, which represents the lower DCT frequencies. High values in these components indicate that the block could have edge characteristics. To determine the level of tolerance against distortions through the watermark caused by each of the two parameters a linear combination is made: Level = smoothscale*smooth + edgescale*edge + offset
The parameter offset is needed for a base strength of the watermark. The linear combination can now be imagined as a watermark strength indicated by offset and slight variations in strength in dependence of the block characteristics weighted with the parameters smoothscale and edgescale.
smoothscale = -10, edgescale= 0.27 and offset=50 were evaluated through experiments.
Because Level can have negative values, Level is restricted to values between 0 and 50:
If Level>50 Level=50
If Level<0 Level = 0
So far the level-estimation is independent from the used watermarking algorithm.
To determine the strength of the watermark in dependence
of Level an additional quantization-factor Qf is used
in the Zhao-Koch algorithm. Every change to DCT-values are made on the
originally DCT-value quantized with Qm/Qf. Therefore
if a change is made to a quantized DCT-value with Qf=1 this
lead to a 4 times higher change than with Qf=4. Qf
is calculated from Level through a table-look-up, because of the
not necessarily linear correlation and the small range of Qf:
| Qf | 1 | 1 | 2 | 3 | 4 | 4 |
| Level/10 | 0 | 1 | 2 | 3 | 4 | >4 |
In the third step the watermark information with the error
corrections and redundancy is embedded as described in the Zhao-Koch algorithm.
From each quantized DCT-block three locations in the medium frequencies
with absolute values Y1, Y2 and Y3 are chosen, where the bit should be
inserted. To encode the bit the three values were changed to one of the
following patterns:
| Bit | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
| Y1 | H | H | M | M | L | L | M | M |
| Y2 | H | M | H | M | L | M | L | M |
| Y3 | L | L | L | L | H | H | H | H |
The retrieval is performed in the same way of the Zhao and Koch algorithm. We decode the single frame of the video and perform the inverse steps of the embedding: first the position generation and the retrieval of the embedded data.
Figure 4: Improved Retrieval Scheme
The first two steps are exactly as in the embedding process. Step 2 is not essential but the information about the strength of the watermark in each block is helpful while using the described error correcting and redundancy code.
In the third step the same three locations from every
block must be examined like the ones used in the embedding process. Then
the patterns could be checked and the watermark bit could be read out.
6.1.3 Experimental Results
The chosen video museum.mpg is about a virtual museum. A camera leads from the entrance from the museum through several rooms. The first 30 frames show a short zoom from the entrance of the museum. For a better view the HTML version of the paper can be found in:
www.darmstadt.gmd.de/mobile/watermarking with a detailed view to the images and video sources used in this paper.
Figure 5: Original museum (first frame)
Original Video (245 KB)
Orgingal Video (Zhao-Koch-Algorithm) (245 KB)
| Video characteristics: | Museum.mpg |
| No. of 8x8 blocks | 1320 |
| Compression I-Frame | 3.42 % |
| Compression P-Frame | 2.65 % |
| Compression B-Frame | 1.22 % |
| IPB-order | IBBPBB |
| Qf | 1 | 1 | 2 | 3 | 4 | 4 |
| Level/10 | 0 | 1 | 2 | 3 | 4 | >4 |
Figure 6: Watermarked museum video (first frame)
Video by Dittmann, Stabeau and Steinmetz (245 KB)
Our robust test results are dispayed in the next table.
We embedded 60 Bits of watermarking information. We performed MPEG-encoding,
Quicktime transformation and StirMark-Attack and got following results
with the improved Zhao-Koch-algorithm. The table measures the bit errors
after the performed transformations with the error correcting code. The
numbers show the amount of bit-errors occurred in the first 13 frames after
high MPEG compression, QuickTime conversion and Stirmark attack.
| Frame No. | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
| F-Type | I | B | B | P | B | B | I |
| MPEG BCH | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| QuickTime BCH | 0 | 1 | 9 | 1 | 0 | 0 | 0 |
| StirMark BCH | 22 | 29 | 29 | 16 | 17 | 19 | 24 |
| Frame No. | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
| F-Type | B | B | P | B | B | I | B |
| MPEG BCH | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| QuickTime BCH | 0 | 0 | 0 | 1 | 0 | 0 | 5 |
| StirMark BCH | 22 | 29 | 29 | 16 | 17 | 19 | 24 |
The following bit error rates can be measured all together
of our experiments ( about 10 video streams):
| museum.mpg: |
| MPEG:
I-Frames <1% P-Frames 1-2% B-Frames 5% |
| QuickTime:
I-Frame 1-2% P-Frames 5% B-Frames 7% |
| StirMark:
15% |
Regarding the error rate table we want to discuss our
results: the first apparent thing is that the algorithm shows very good
results in MPEG compression and Quicktime conversions. The error rates
of the watermark information in I-Frame is excellent. B-Frames have still
some problems. StirMark removes the watermark up to 30 percent, without
the error correcting code the watermark was destroyed completely. Stirmark
destroys the watermark in the original Zhao-Koch completely. Visual artefacts
can be avoided.
6.2.1 Embedding method
First of all a position sequence like the one used in the Zhao-Koch algorithm is generated to determine the blocks we want to modify. The next figure illustrates the embedding steps. For each block a user key dependent pattern is made in the following manner: We start by creating a 8x8 pseudo random pattern with the user key as a seed, step 1. To eliminate the high frequencies in this pattern a cellular automaton with simple voting rules is used. Every position in the 8x8 random pattern is tested on the number of 1 in the eight co-sited positions. If the number exceeds five the actual position is set to 1 too, if the number is less than 3 it is set to 0, see the marked rectangle for an example. By applying these rules several times on the whole 8x8 block we obtain a pattern M with less high frequencies, steps 2 - 4. Now a correlation between the pattern and the luminance block has to be inserted according to the bit we want to embed, step 5. If we want to embed a 1 we add a value k, which is calculated in the smooth/edge block estimation routine via a table look up from Level, in each luminance block position where the corresponding position in the pattern M is 1 and we subtract the value k if the corresponding position is 0. If we want to embed a 0 we do it vice versa.
Due to the fact that we use much smaller patterns (8x8) than Fridrich (one pattern for the whole frame), we can embed much more information. The disadvantage of this technique is that we have to calculate with high bit errors in the detection process. We can overcome this shortcoming by applying an error-correcting code (we use a (31, 6, 15)-BCH code) and an additional redundancy code on the watermark information before we start the embedding process. The redundancy is determined by the parameter redundancy bit.
6.2.2 Retrieval Method
In the retrieval process seen in the next figure the same 8x8 patterns M have to be generated as in the embedding process, step 1 - 4. To test the correlation between the luminance block and the pattern M the average luminance value av1 (sum1 div #1) of positions with a corresponding 1 and the average luminance value av0 (sum0 div #0) with a corresponding 0 in the pattern M is produced. If the luminance block and the pattern M would be uncorrelated the difference of both values should be near zero. But due to the embedding process one of these values should be significantly higher (around 2*k) than the other. Thus we estimate an embedded bit 1 if av1>av0. Otherwise we estimate an embedded bit 0. With this statistical analysis we avoid using the original frames. If all bits are retrieved the watermark information is decoded with the same (31,6,15) BCH-Code and the additional redundancy code. The retrieval process is shown in the following diagram:
We have tested the improved Fridrich-algorithm with the
following parameters: smoothscale = -12, edgescale = 0.27 and offset =
50, watermarking strength 1.1 and bit redundancy 4. Conversion table from
Level to k (see description of embedding method):
| k | 12 | 12 | 6 | 4 | 3 | 3 |
| Level/10 | 0 | 1 | 2 | 3 | 4 | >4 |
Figure 9: Watermarked museum video (first frame)
Orginal-Video (Fridrich-Algorithm) (245 KB)
We embedded 60 Bits of copyrigth information. We performed
the same transformations: MPEG-encoding, Quicktime transformation and StirMark-Attack
and got following results with the improved Fridrich-algorithm. The table
measures the amount of bit errors after the performed transformations with
BCH code and the additional redundancy code.
| Frame No. | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
| F-Type | I | B | B | P | B | B | I |
| MPEG BCH | 0 | 0 | 7 | 6 | 0 | 4 | 0 |
| QuickTime BCH | 0 | 12 | 16 | 11 | 23 | 15 | 5 |
| StirMark BCH | 19 | 27 | 24 | 23 | 26 | 27 | 24 |
| Frame No. | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
| F-Type | B | B | P | B | B | I | B |
| MPEG BCH | 0 | 2 | 0 | 2 | 8 | 0 | 5 |
| QuickTime BCH | 15 | 18 | 4 | 26 | 33 | 7 | 18 |
| StirMark BCH | 23 |
The following error rates can be measured all together
of our experiments ( about 10 video streams):
| museum.mpg: |
| MPEG:
I-Frames 1-2% P-Frames 3% B-Frames 7% |
| QuickTime:
I-Frame 8% P-Frames 10% B-Frames 20% |
| StirMark: 12% |
Regarding the error rate table we want to discuss our results:
If only I- Frames would be watermarked the error rate after MPEG-encoding are also promising. B- and P-frames are not sufficient watermarked. Compared to our adapted Zhao-Koch implementations the algorithm is less successfully with the used error correction. If the watermark should be robust against QuickTime conversion though, the strength of the watermark must be increased by changing the value k or the parameters of the smooth block and edge detection part. As with the Zhao-Koch algorithm the watermark has error rates up to 30 percent after the StirMark attack. But the advantage of the algorithm is, that we can add up to 250 Bits with quite the same error rates. And the algorithm is more flexible to handle StirMark attacks more efficiently and robustly.
Now we want to discuss the visual artefacts in detail. To get a more descriptive view on the distortions introduced in the different steps, we measure the differences of the changes to the original frame. The idea is to transform the difference into a 3D scene, [5]. The absolute difference between the original frame and the watermarked frame, the watermarked areas and the intensity of the watermark, measured by the height of the 3D relief can be seen. Based on these information the quality loss can be measured. It can be seen, if relevant image objects are watermarked and the robustness can be evaluated, regarding the intensity, and different algorithms can be compared. We have created the following 3D-scenes:
Figure 10: a) Watermarked, Fridrich
Figure 10: b) Watermarked, Zhao-Koch
In the above pictures you see the differences between
the original frames and the watermarked versions. The different strengths
of the watermark in dependence to the smooth and edge characteristics of
the picture can be seen very good. Apparently both algorithms introduce
similar changes. This is because of the fact, that both use the same smooth
and edge detection algorithm and both introduce changes in the medium frequencies.
Figure 11: MPEG-
reencoded Fridrich
The figure 11 describe the difference of the watermarked-re-encoded frames to the original ones. The difference to the upper two pictures are quite small but leads already to the observed error rates of the Fridrichs approach.
Figure 12: a) StirMark, Fridrich
Figure 12: b) StirMark, Zhao-Koch
The 3D-scenes of figure 12 show the StirMark distortions. Although the distortions seems to be very high they are invisible to the observer when only looking at the "StirMarked" frame. This is due to the fact that the biggest distortions are introduced through slight geometric transformations which are difficult to detect without the original frame. Nevertheless they are not invisible to the watermark detection algorithms as can be seen in the appropriate error rates.
Figure 13: a) QuickTime conversion, Fridrich
Figure 13: b) QuickTime conversion, Zhao-Koch
The last two pictures off figure 13 show the changes to the watermarked frames due to the QuickTime conversion. The distortions of the Zhao-Koch are lower and verify the measured error rates.
The main problem is to retrieve the correct watermarked regions of the whole frame without the knowledge of the position of the objects. Our first approach assumes a minimal region of 64x64 pixel blocks which is watermarked.
The watermarking with the second approach in the spatial domain is simple: a multiple 64x64 pixel pattern which depends on the user key is overlayed across the region which should be watermarked. The retrieval searches for a correlation of the 64x64 patterns in the actual video frame. If the correlation threshold is found an object can be identified. Our experiments have shown that the watermarking strength must be very high to differ from similar primary correlation in other regions of the video frame which were not watermarked. Therefore this practice causes substantial artefacts in the watermarked regions an can be found very easy for an attacker. For tests we have watermarked three regions in the background. The results can be seen in the HTMLversion.
The first approach in the frequency domain provides better results. We embedded an binary sequence of alternating 0 and 1 in 8x8 blocks. The retrieval searches for this alternating sequence in every 64x64 block. The amount of matches of the 0-1 sequence is measured. The visual distortions are less then with the approach II and can be evaluated in the HTML version.
The 3D scenes are very useful to evaluate the visual artefacts
and the distortions after several attacks. Our goal is to integrate the
watermarking techniques in our distributed video production and distribution
environment as an enabling technology for electronic commerce and for digital
market places to ensure copyrights.
[2] Cox, I.J., Miller, M.L.: A review of watermarking and the importance of perceptual modeling, Proc. Of Electronic Imaging97, February 1997
[3] Digimarc: Watermarking Technology, PictureMarcTM 1996, http://www.digimarc.com/wt_page.html
[4] Dittmann, J., Steinmetz, A.: Konzeption von Sicherheitsmechanismen für das Projekt DiVidEd, GMD-Studie 97
[5] Dittmann, J., Steinmetz, A., Nack, F., Steinmetz, R.: Interactive Watermarking Environments, to appear in IEEE Multimedia 1998, Austin Texas
[6] Fridrich, J. :Methods for data hidung, Center for Intelligent Systems & Department of Systems Science and Industrial Engineering, SUNY Binghamton, Methods for Data Hiding", working paper (1997)
[7] Hartung, F., Girod, B.: Copyright Protection in Video Delivery Networks by Watermarking of Pre-Compressed Video, in: S. Fdida, M. Morganti (eds.), "Multimedia [7] Applications, Services and Techniques - ECMAST '97", Springer Lecture Notes in Computer Science, Vol. 1242, pp.423-436, Springer, Heidelberg, 1997
[8] Koch, E. and Zhao, J.: Towards Robust and Hidden Image Copyright Labelling, Proc. of 1995 IEEE Workshop on Nonlinear Signal and Image Processing (Neos Marmaras, Greece, Junu 20-22, 1995)
[9] Kuhn, M.G.: Stirmark, available at http://www.cl.cam.ac.uk/mgk25/strirmark/, Security Group, Computer Lab, Cambridge University, UK (email: mkuhn@acm.org)
[10] Kutter, M., Jordan,F. and Bossen,F.: Digital Signature of Colour Images using Amplitude Modulation, Signal Processing Laboratory, EPFL, Switzerland, 1995
[11] MPEG Internationaler Standard ISO/IEC 11172: Information
Technology - Coding of moving pictures and associated audio for digital
storage media at up to about 1,5 Mbit/s, Part1: Systems, Part2: Video,
Part3: Audio, 1993
| Juni 1998 |