it-e-69 Image Compression

Why is image compression so important? Image files, in an uncompressed form, are very large.
And the Internet, especially for people using a 56k dialup modem, can be pretty slow.[1]This
combination could seriously limit one of the Web's most appreciated aspectsits ability to present
images easily.
JPEG (Joint Photographic Experts Group)compression is currently the best way to compress
PHOTOGRAPHIC IMAGES for the web. Other forms of image compression, including GIF and
PNG, are best used for other purposes on the web.
GIF (Graphics Interchange Format) is best used for graphics that have a limited color pallet
and large areas of flat tone, like cartoons or banners. Although it has several remarkable features,
such as transparency and the ability to present animated images, it is not well suited for the
presentation of continuous tone images, such as photographs, due to its limit of 256 colors.
PNG (Portable Network Graphics) is a relatively new format with a lot of potential but, until
all browsers can see images compressed in PNG form, it is not a good idea to use it.
JPEG, or JPG, is an evolving format that is universal in its use as a means of compressing
continuous tone photographs for speedy transmission over the Internet. Photographs compressed
using the JPEG format look good because JPEG supports millions of colors, so you can see the
gradation of tones.
A Bitmap is a simple series of pixels all stacked up. But the same image saved in GIF or
JPEG format uses less bytes to make up the file. How? Compression.
"Compression" is a computer term that represents a variety of mathematical formats used to
compress an image's byte size. Let's say you have an image where the upper right-hand corner
has four pixels all the same color. Why not find a way to make those four pixels into one?[2]That
would cut down the number of bytes by three-fourths, at least in the one corner. That's a
compression factor.
Bitmaps can be compressed to a point. The process is called "run-length encoding." Runs of
pixels that are all the same color are all combined into one pixel. [3]The longer the run of pixels,

the more compression. Bitmaps with little detail or color variance will really compress. Those
with a great deal of detail don't offer much in the way of compression. Bitmaps that use the
run-length encoding can carry either the common ".bmp" extension or ".rle". Another difference
between the two files is that the common Bitmap can accept 16 million different colors per pixel.
Saving the same image in run-length encoding knocks the bits-per-pixel down to 8. That locks
the level of color in at no more than 256.
So, why not create a single pixel when all of the colors are close? You could even lower the
number of colors available so that you would have a better chance of the pixels being close in
color. Good idea. The people at CompuServe felt the same way.
GIF, which stands for "Graphic Interchange Format," was first standardized in 1987 by
CompuServe, although the patent for the algorithm (mathematical formula) used to create GIF
compression actually belongs to Unisys. The first format of GIF used on the Web was called
GIF87a, representing its year and version. It saved images at 8 bits-per-pixel, capping the color
level at 256. That 8-bit level allowed the image to work across multiple server styles, including
CompuServe, TCP/IP.
CompuServe updated the GIF format in 1989 to include animation, transparency, and interlacing.
They called the new format, you guessed it: GIF89a.
There's no discernable difference between a basic (known as non-interlaced) GIF in 87 and
89 formats.


1, continuous tone  

2, gradation  [ɡrə'deiʃən]
n. (色彩、颜色、次序、音调等的)渐变;分等级;(各种状态、性质等的)分阶段渐变;元音交替

3, discernable  [di'sə:nəbl, -'zə:-]
adj. 可辨别的;可认识的

4, interlaced  [,intə'leist]
a. 交织的,交错的

5, interlacing  [intə(:)'leisiŋ]
n. 交错,隔行;隔行扫描
v. 交错,交织(interlace现在分词)

Continue reading it-e-69 Image Compression




为此,我们进行了一系列数据产品的研发,比如为大家所熟知的量子统计、数据魔方和淘宝指数等。尽管从业务层面来讲,数据产品的研发难度并不高;但在 “海量”的限定下,数据产品的计算、存储和检索难度陡然上升。本文将以数据魔方为例,向大家介绍淘宝在海量数据产品技术架构方面的探索。



图1 淘宝海量数据产品技术架构
图1 淘宝海量数据产品技术架构


在数据源层实时产生的数据,通过淘宝自主研发的数据传输组件DataX、DbSync和Timetunnel准实时地传输到一个有1500个节点的 Hadoop[注:apache开源分布文件系统,主要用途搜索引擎。]集群上,这个集群我们称之为“云梯”,是计算层的主要组成部分。在“云梯”上,我们每天有大约40000个作业对1.5PB的原始数据按照产品需求进行不同的MapReduce[注:Google提出的概念,MapReduce是一种编程模型,用于大规模数据集(大于1TB)的并行运算。概念"Map(映射)"和"Reduce(化简)",和他们的主要思想,都是从函数式编程语言里借来的,还有从矢量编程语言里借来的特性。他极大地方便了编程人员在不会分布式并行编程的情况下,将自己的程序运行在分布式系统上]计算。这一计算过程通常都能在凌晨两点之前完成。相对于前端产品看到的数据,这里的计算结果很可能是一个处于中间状态的结果,这往往是在数据冗余与前端计算之间做了适当平衡的结果。







关系型数据库(RDBMS)自20世纪70年代提出以来,在工业生产中得到了广泛的使用。经过三十多年的长足发展,诞生了一批优秀的数据库软件,例如Oracle、MySQL、DB2、Sybase和SQL Server等。

图2 MyFOX中的数据增长曲线
图2 MyFOX中的数据增长曲线

尽管相对于非关系型数据库而言,关系型数据库在分区容忍性(Tolerance to Network Partitions)方面存在劣势,但由于它强大的语义表达能力以及数据之间的关系表达能力,在数据产品中仍然占据着不可替代的作用。


图3 MyFOX的数据查询过程
图3 MyFOX的数据查询过程


图4 MyFOX节点结构
图4 MyFOX节点结构


顾名思义,“热节点”存放最新的、被访问频率较高的数据。对于这部分数据,我们希望能给用户提供尽可能快的查询速度,所以在硬盘方面,我们选择了每分钟15000转的SAS硬盘,按照一个节点两台机器来计算,单位数据的存储成本约为4.5W/TB。相对应地,“冷数据”我们选择了每分钟7500转的 SATA硬盘,单碟上能够存放更多的数据,存储成本约为1.6W/TB。

将冷热数据进行分离的另外一个好处是可以有效降低内存磁盘比。从图4可以看出,“热节点”上单机只有24GB内存,而磁盘装满大约有 1.8TB(300 * 12 * 0.5 / 1024),内存磁盘比约为4:300,远远低于MySQL服务器的一个合理值。内存磁盘比过低导致的后果是,总有一天,即使所有内存用完也存不下数据的索引了——这个时候,大量的查询请求都需要从磁盘中读取索引,效率大打折扣。



图5 全属性选择器
图5 全属性选择器

这是一个非常典型的例子。为了说明问题,我们仍然以关系型数据库的思路来描述。对于笔记本电脑这个类目,用户某一次查询所选择的过滤条件可能包括 “笔记本尺寸”、“笔记本定位”、“硬盘容量”等一系列属性(字段),并且在每个可能用在过滤条件的属性上,属性值的分布是极不均匀的。在图5中我们可以看到,笔记本电脑的尺寸这一属性有着10个枚举值,而“蓝牙功能”这个属性值是个布尔值,数据的筛选性非常差。



图6 Prom的存储结构
图6 Prom的存储结构

从图6可以看出,我们选择了HBase作为Prom的底层存储引擎。之所以选择HBase,主要是因为它是建立在HDFS之上的,并且对于 MapReduce有良好的编程接口。尽管Prom是一个通用的、解决共性问题的服务框架,但在这里,我们仍然以全属性选择为例,来说明Prom的工作原理。这里的原始数据是前一天在淘宝上的交易明细,在HBase集群中,我们以属性对(属性与属性值的组合)作为row-key进行存储。而row-key 对应的值,我们设计了两个column-family,即存放交易ID列表的index字段和原始交易明细的data字段。在存储的时候,我们有意识地让每个字段中的每一个元素都是定长的,这是为了支持通过偏移量快速地找到相应记录,避免复杂的查找算法和磁盘的大量随机读取请求。

图7 Prom查询过程
图7 Prom查询过程

图7用一个典型的例子描述的Prom在提供查询服务时的工作原理,限于篇幅,这里不做详细描述。值得一提的是,Prom支持的计算并不仅限于求和 SUM运算,统计意义上的常用计算都是支持的。在现场计算方面,我们对Hbase进行了扩展,Prom要求每个节点返回的数据是已经经过“本地计算”的局部最优解,最终的全局最优解只是各个节点返回的局部最优解的一个简单汇总。很显然,这样的设计思路是要充分利用各个节点的并行计算能力,并且避免大量明细数据的网络传输开销。




图8 glider的技术架构
图8 glider的技术架构






图9 缓存控制体系
图9 缓存控制体系

图9向我们展示了数据魔方在缓存控制方面的设计思路。用户的请求中一定是带了缓存控制的“命令”的,这包括URL中的query string,和 HTTP头中的“If-None-Match”信息。并且,这个缓存控制“命令”一定会经过层层传递,最终传递到底层存储的异构“表”模块。各异构“表” 除了返回各自的数据之外,还会返回各自的数据缓存过期时间(ttl),而glider最终输出的过期时间是各个异构“表”过期时间的最小值。这一过期时间也一定是从底层存储层层传递,最终通过HTTP头返回给用户浏览器的。







Continue reading 【转+评】淘宝数据魔方技术架构解析

it-e-68 Can Graphics Files Be Enctypted

Of course you can encrypt a graphics file. After all, most encryption algorithms don't care
about the intellectual content of a file. All they chew on is a series of byte values. Therefore,
most any encryption program that works on ordinary text files will work on graphics files as well.
Why would you want to encrypt a graphics file? Mostly to control who can view its contents.
You can invent a proprietary file format and that might slow a file format hack down for, say,
five or ten minutes. You could add a proprietary data compression scheme, possibly a twisted
variation of an already public algorithm. But there are so many people out there with nothing
better to do than hack at unknown data formats that your data would probably be exposed in little
time. But suppose we top off all this effort by encrypting the graphics file itself as we would an
ordinary text file. Would your data then be safe?
Realize that an encrypted graphics file still might not be very secure. For every data
encryption algorithm there exists at least one method of getting around it, although it may take
hundreds of computers and many years to fully employ and execute that method!
For example, one of the more popular methods used to encrypt data is the Vernam or XOR
cipher. This cipher Exclusive ORs the plain-text data with a single, random, fixed-length key.
The longer the key the harder it is to break the cipher. A totally random key the length of your
data is impossible to break. Shorter and less-random keys are easier to break.
XOR is very simple and fast, which is a must for a graphics file translators/viewers that
must decrypt a file on the fly. A problem, however, is that most graphics files contain fixed size
headers which vary only slightly in content from file to file. If you knew the approximate
contents of the header of an encrypted file you could XOR a "decrypted" header with the
encrypted file and possibly produce the key used to encrypt the file. A short key might be very
easily discovered in this way.
If you wish to use a public key/private key encryption method, then storing the public key in
the file format header (usually as a 4-byte field) and only encrypting the image data would be the
way to go. The SMPTE DPX file format supports such an encryption feature.
If you really need to make the contents of a graphics file secure, then I'd suggest not only
using some form of data encryption, but also create an unconventional and proprietary file format
and do not publish its format specification.

cipher ['saifə]

n. 零,暗号

  • v. 计算,做算术

Continue reading it-e-68 Can Graphics Files Be Enctypted

it-e-67 Can Graphics Files Be Infected with a Virus

For most types of graphics file formats currently available the answer is "no". A virus (or
worm, Trojan horse, and so forth) is fundamentally a collection of code (that is, a program) that
contains instructions which are executed by a CPU. Most graphics files, however, contain only
static data and no executable code. The code that reads, writes, and displays graphics data is
found in translation and display programs and not in the graphics files themselves. If reading or
writing a graphics file caused a system malfunction is it most likely the fault of the program
reading the file and not of the graphics file data itself.
With the introduction of multimedia we have seen new formats appear, and modifications to
older formats made, that allow executable instructions to be stored within a file format. These
instructions are used to direct multimedia applications to play sounds or music, prompt the user

for information, or display other graphics and video information. And such multimedia display
programs may perform these functions by interfacing with their environment via an API, or by
direct interaction with the operating system. One might also imagine a truly object-oriented
graphics file as containing the code required to read, write, and display itself.
Once again, any catastrophes that result from using these multimedia application is most
like the result of unfound bugs in the software and not some sinister instructions in the graphics
file data. Such "logic bombs" are typically exorcised through the use of testing using a wide
variety of different image files for test cases.
If you have a virus scanning program that indicates a specific graphics file is infected by
virus, then it is very possible that the file coincidentally contains a byte pattern that the scanning
programming recognizes as a key byte signature identifying a virus. Contact the author (or even
read the documentation!) of the virus scanning program to discuss the probability of the
mis-identification of a clean file as being infected by a virus. Save the graphics file, as the author
will most likely wish to examine it as well.
If you suspect a graphics file to be at the heart of a virus problem you are experiencing, then
also consider the possibility that the graphics file's transport mechanism (floppy disk, tape or
shell archive file, compressed archive file, and so forth) might be the original source of the virus
and not the graphics file itself.

1, catastrophe  [kə'tæstrəfi]
n. 大灾难;大祸;惨败

2, sinister  ['sinistə]
a. 不吉利的,凶恶的,左边的

3, exorcise  ['eksɔ:saiz]
vt. 驱邪;除怪

4, coincidentally  [kəu,insi'dentli]
adv. 巧合地;一致地

Continue reading it-e-67 Can Graphics Files Be Infected with a Virus

it-e-66 Introduction to Digital Image Processing

An image is digitized to convert it to a form that can be stored in a computer's memory or on

some form of storage media such as a hard disk or CD-ROM. This digitization procedure can be
done by a scanner, or by a video camera connected to a frame grabber board in a computer. Once an
image has been digitized, it can be operated upon by various image processing operations.
Image processing operations can be roughly divided into three major categories, Image
Compression, Image Enhancement and Restoration, and Measurement Extraction[提取尺寸]. Image
compression is familiar to most people. [1]It involves reducing the amount of memory needed to
store a digital image.
Image defects which could be caused by the digitization process or by faults in the imaging
set-up (for example, bad lighting) can be corrected using Image Enhancement techniques. Once
the image is in good condition, the Measurement Extraction operations can be used to obtain
useful information from the image.
Some examples of Image Enhancement and Measurement Extraction are given below. The
examples shown all operate on 256 grey-scale images. This means that each pixel in the image is
stored as a number between 0 to 255, where 0 represents a black pixel, 255 represents a white
pixel and values in-between represent shades of grey. These operations can be extended to
operate on colour images.
The examples below represent only a few of the many techniques available for operating on
images. Details about the inner workings of the operations have not been given.
Image Enhancement and Restoration
The image at the left of Figure 1 has been corrupted image
by noise during the digitization process. The "clean"
image at the right of Figure 1 was obtained by applying a
median filter [中值滤波器]to the image.
An image with poor contrast, such as the one at
the left of Figure 2, can be improved by adjusting the
image histogram to produce the image shown at the
right of Figure 2.

The image at the top left of Figure 3 has a corrugated effect due to a fault in the acquisition
process. This can be removed by doing a 2-dimensional Fast-Fourier Transform[快速傅里叶变换] on the image
(top right of Figure 3), removing the bright spots (bottom left of Figure 3), and finally doing an
inverse Fast Fourier Transform to return to the original image without the corrugated background
  (bottom right of Figure 3).


An image which has been captured in poor lighting conditions, and shows a continuous
change in the background brightness across the image (top left of Figure 4) can be corrected
using the following procedure. First remove the foreground objects by applying a 25 by 25
greyscale dilation operation (top right of Figure 4). Then subtract the original image from the
background image (bottom left of Figure 4). Finally invert the colors and improve the contrast by
adjusting the image histogram (bottom right of Figure 4).


The example below demonstrates how one could go about extracting measurements from an
image. The image at the top left of Figure 5 shows some objects. The aim is to extract information
about the distribution of the sizes (visible areas) of the objects. The first step involves segmenting
the image to separate the objects of interest from the background. This usually involves
thresholding the image, which is done by setting the values of pixels above a certain threshold value
to white, and all the others to black (top right of Figure 5). Because the objects touch, thresholding
at a level which includes the full surface of all the objects does not show separate objects. This
problem is solved by performing a watershed separation on the image (lower left of Figure 5). The
image at the lower right of Figure 5 shows the result of performing a logical AND of the two
images at the left of Figure 5. This shows the effect that the watershed separation has on touching
objects in the original image. Finally, some measurements can be extracted from the image.


frame grabber board  帧中继访问设备


1, corrugated  ['kɔrəgeitid]
a. 缩成皱纹的,使起波状的

2, dilation  [dai'leiʃən, di-]
n. 扩张,扩大;膨胀;详述

3, subtract  [səb'trækt]
v. 减去,扣掉,减少

4, segmenting  
n. 分段

5, thresholding  
n. 阈值转换法;域值

6, watershed  ['wɔ:təʃed, 'wɔ-]
n. (美)流域;分水岭;集水区;转折点
adj. 标志转折点的

Continue reading it-e-66 Introduction to Digital Image Processing

it-e-65 Convert a Graphics Format

Why change vectors to bitmaps?
Most of the clip art gallery is vector-based and will need to be converted into bitmap formats
(GIF) prior to putting it on the Web.
Why change bitmaps to vectors?
You will need to change vectors to bitmaps to perform tasks from the Drawing toolbar on a
bitmap picture (such as animate parts of a bitmap picture) you will need to convert it into a vector
format. You can then e.g. ungroup it and apply animations to only parts of it.
Which graphic converter to use?
To change your graphics format, you need to use a graphics converter. A popular graphics
editor you can use for this is Paint Shop Pro. Another graphics editor you can use is Adobe
PhotoShop, which is said to be the best one for this kind of conversion.
How to use your graphic converter?
Open your file in the graphics editor chosen: Select File | Open.
Select File | Save As.
Rename your file and choose a new format. For a bitmap to vectors conversion select the
WMF format. For the opposite conversion, select the GIF format if you have PowerPoint 97,
otherwise select JPEG or TIFF.
Unfortunately, some of the quality may be lost in the switch. MS office also provides
graphics converters.

Continue reading it-e-65 Convert a Graphics Format



java只有源代码,不过包含了个make.bat运行后就生成了三个jar包,服务端只需要phprpc.jar,客户端需要phprpc_client.jar,不需要别的依赖包,very nice,那个phprpc_spring.jar的包尚不知是做什么用的。


>package kzg.phprpc.hello.api;

Continue reading phprpc初步

it-3-64 Course Description

This course is an introduction to the basic concepts as well as applications of the rapidly
emerging field of digital image processing. It familiarizes the audience with the understanding,
design, and implementation of algorithms in the various subareas of digital image processing
such as image enhancement, image deblurring, image understanding, image security, and image
compression. Over 200 image examples complement the technical descriptions.
Benefits/Learning Objectives
This course will enable you to
explain the fundamental concepts and terminologies employed in digital imaging such as
sampling and aliasing, perceptual quantization; filtering, look-up tables, image histogram,

explain the various techniques used in image enhancement for contrast manipulation (e.g.,
histogram equalization), sharpening (e.g., unsharp masking) and noise removal (e.g.,
selective averaging, median filtering);
briefly demonstrate the performance of image deblurring algorithms such as inverse filtering
and Wiener filtering by using image examples;
briefly demonstrate the concepts behind digital signatures for image authentication and
invisible watermarking for image copyright protection;
briefly describe the current research topics in image understanding and demonstrate related
algorithm performances using image examples;
explain the basic technologies that serve the existing JPEG and the emerging JPEG2000
Intended Audience
Scientists, engineers, and managers who need to understand and/or apply the fundamental
concepts and techniques employed in digital image processing. Although no particular background is
needed, some prior knowledge of linear system theory (e.g., Fourier transforms) would be helpful.

1, deblurring  
n. [计] 去模糊
v. 由模糊变清晰;擦掉…的污点(deblur的ing形式)

2, histogram  ['histəugræm]
n. 柱状图
[计算机] 直方图

3, perceptual  [pə'septjuəl]
a. 感性的,知觉的

4, quantization  [,kwɔntai'zeiʃən]

5, Fourier  ['furiei]
n. 傅里叶

Continue reading it-3-64 Course Description


Total views.

© 2013 - 2019. All rights reserved.

Powered by Hydejack v6.6.1