it-e-02 web harvesting

As the amount of information on the Web grows, that information becomes ever harder to
keep track of and use. Search engines are a big help, but they can do only part of the work, and
they are hard-pressed to keep up with daily changes.
Consider that even when you use a search engine to locate data, you still have to do the
following tasks to capture the information you need: scan the content until you find the
information, mark the information (usually by highlighting with a mouse), switch to another
application ( such as a spreadsheet, database or word processor), paste the information into that
application.

A better solution, especially for companies that are aiming to exploit a broad swath of data
about markets or competitors, lies with Web harvesting tools.
Web harvesting software automatically extracts information from the Web and picks up
where search engines leave off, doing the work the search engine can't. Extraction tools automate
the reading, copying and pasting necessary to collect information for analysis, and they have
proved useful for pulling together information on competitors, prices and financial data or all
types.
There are three ways we can extract more useful information from the Web.
The first technique, Web content harvesting, is concerned directly with the specific content
of documents or their descriptions, such as HTML files, images or e-mail messages. Since most
text documents are relatively unstructured (at least as far as machine interpretation is concerned),
one common approach is to exploit what's already known about the general structure of
documents and map this to some data model.
The other approach to Web content harvesting involves trying to improve on the content
searches that tools like search engines perform. This type of content harvesting goes beyond
keyword extraction and the production of simple statistics relating to words and phrases in
documents.
Another technique, Web structure harvesting, takes advantage of the fact that Web pages
can reveal more information than just their obvious content. Links from other sources that point
to a particular Web page indicate the popularity of that page, while links within a Web page that
point to other resources may indicate the richness or variety of topics covered in that page. This
is like analyzing bibliographical citations— paper that's often cited in bibliographies and other
paper is usually considered to be important.
The third technique, Web usage harvesting, uses data recorded by Web servers about user
interactions to help understand user behavior and evaluate the effectiveness of the Web structure.
General access—pattern tracking analyzes Web logs to understand access patterns and
trends in order to identify structural issues and resource groupings.
Customized usage tracking analyzes individual trends so that Web sites can be personalized
to specific users. Over time, based on access patterns, a site can be dynamically customized for a
user in terms of the information displayed , the depth of the site structure and the format of the
resource presented.

Continue reading it-e-02 web harvesting

读The Definitive ANTLR Reference

没看的很明白,摸出来的。

lexer?parser?tree?token?

token以大写字母开头,对应目标语言的静态字段。

rule以小写字母开头对应目标语言的实例方法。

token里面的连接表现和rule里面的连接表现是不一样的。

TOKEN:

'h' 'i' //匹配hi

rule:

'h' 'i'//匹配h i,不匹配hi[中间有空格的区别]

子rule用()包含

action用{}包含,使用目标语言写

ANTLR中表示字符要用“’”单引号括起来,用‘(’ ‘)’来表示括号字符

channel:

解析的token放在不同的channel,由你来设定。

token序号是整体排序的

The token buffer preserves the relative token order regardless of the token channel numbers.

skip()小心用。

WS : (' ' |'\t' |'\r' |'\n' )+ {skip();$channel=HIDDEN;} ; //这样将匹配丢到hidden chennel可行,那么不行需要再写其他的rule时总是要写WS匹配。例: WS?'hello' WS+ 'how' WS+ 'are' WS+ 'you' -->'hello' 'how' 'are' 'you'.

Method skip( ) in an embedded lexer rule action forces the lexer to throw
out the token and look for another.运行结果似乎是视而不见【抛弃】。

fragment

如果一个token被另外的token使用,则被使用的token需要加fragment标记。

如果加了fragment,则此token不要在rule里面直接引用[如果引用了会达不到预期结果],而应该在其他的token里面引用。

对于没有加frgament的token又有个原则,它不能与其他的token有交集。即不能一个输入既可以匹配token1又可以匹配token2.

使用antlrworks时,如果出现警告,尽量消除它。尽量列举所有的情况而不要使用多个* ?来匹配。

错误解读:

line 12:15 no viable alternative at input 'xxx'

12行15字碰到没有定义的输入xxx【定义中没有列举这样的输入情况】

局部规则可能影响整个规则,特别对于+ *之类的泛匹配例如('a'..'z' | 'A'..'Z')+

暂且放下:

遇到的困难:

1:对于无引号token不知如何识别,如果使用

TOKEN=CHAR+

rule = TOKEN+

则可以匹配所有,造成整个语法混乱

2:对于递归语法不知如何识别,例如

[[...] [...]] 和 ((...) (...))

Continue reading 读The Definitive ANTLR Reference

it-e-03 Computer hardware

Computer hardware has four parts: the central processing unit (CPU) and memory, storage
hardware, input hardware, and output hardware.
The part of the computer that runs the program is known as the processor or central processing
unit (CPU). In a microcomputer, the CPU is on a single electronic component, the microprocessor
chip, within the system unit or system cabinet. The CPU itself has two parts: the control unit and
the arithmetic-logic unit. In a microcomputer, these are both on the microcomputer chip.
The Control Unit The control unit tells the rest of the computer system how to carry out a
program's instructions. It directs the movement of electronic signals between memory and the
arithmetic-logic unit. It also directs these control signals between the CPU and input and output
devices.
The Arithmetic-Logic Unit The arithmetic-logic unit, usually called the ALU, performs
two types of operations—arithmetic and logical. Arithmetic operations are, as you might expect,
the fundamental math operations: addition, subtraction, multiplication, and division. Logical
operations consist of comparisons. That is , two pieces of data are compared to see whether one is
equal to, less than, or greater than the other.
Memory Memory is also known as primary storage, internal storage, it temporarily holds
data, program instructions, and information. One of the most important facts to know about
memory is that part of its content is held only temporarily. In other words, it is stored only as
long as the computer is turned on. When you turn the machine off, the content immediately
vanish. The stored contents in memory are volatile and can vanish very quickly.
Storage Hardware [1]The purpose of storage hardware is to provide a means of storing
computer instructions and data in a form that is relatively permanent, that is, the data is not lost
when the power is turned off—and easy to retrieve when needed for processing.There are four
kinds of storage hardware: floppy disks, hard disks,optical disk,andmagnetic tape.
Floppy Disks Floppy disks are also called diskettes, flexible disks, floppies, or simply

disks. The plastic disk inside the diskette cover is flexible, not rigid. They are flat, circular pieces
of mylar plastic that rotate within a jacket. Data and programs are stored as electromagnetic
charges on a metal oxide film coating the mylar plastic.
Hard Disks Hard disks consist of metallic rather than plastic platters. They are tightly
sealed to prevent any foreign matter from getting inside. Hard disks are extremely sensitive
instruments. The read-write head rides on a cushion of air about 0.000001 inch thick. It is so thin
that a smoke particle, fingerprint, dust, or human hair could cause what is known as a head crash.
A head crash happens when the surface of the read-write head or particles on its surface contact
the magnetic disk surface. A head crash is a disaster for a hard disk. It means that some or all of
the data on the disk is destroyed. Hard disks are assembled under sterile conditions and sealed
from impurities within their permanent containers.
Optical Disks Optical disks are used for storing great quantities of data. An optical disk can
hold 650 megabytes of data—the equivalent of hundreds of floppy disks. Moreover, an optical disk
makes an immense amount of information available on a microcomputer. In optical-disk technology,
a laser beamalters the surface of a plastic or metallic disk to represent data. To read the data, a laser
scans these areas and sends the data to a computer chip for conversion.
Magnetic Tape Magnetic tape is an effective way of making a backup, or duplicate, copy of
your programs and data. We mentioned the alarming consequences that can happen if a hard disk
suffers a head crash. You will lose some or all of your data or programs. Of course, you can always
make copies of your hard-disk files on floppy disks. However, this can be time-consuming and may
require many floppy disks. Magnetic tape is sequential access storage and can solve the problem
mentioned above.
Input Hardware Input devices take data and programs people can read or understand and
convert them to a form the computer can process. This is the machine-readable electronic signals
of 0s and 1s. Input hardware is of two kinds: keyboard entry and direct entry.
Keyboard Entry Data is input to the computer through a keyboard that looks like a
typewriter keyboard but has additional keys. In this method, the user typically reads from an
original document called the source document. The user enters that document by typing on the
keyboard.
Direct Entry :Data is made into machine-readable form as it is entered into the computer,
no keyboard is used. Direct entry devices may be categorized into three areas: pointing devices
(for example, mouse, touch screen, light pen, digitizer are all pointing devices), scanning devices
(for example, image scanner, fax machine, bar-code reader are all scanning devices), and
voice-input devices.
Output Hardware Output devices convert machine-readable information into people-readable
form. Common output devices are monitors, printers, plotters, and voice output.
Monitors Monitors are also called display screen or video display terminals. Most monitors that
sit on desks are built in the same way as television sets, these are called cathode-ray tubes. Another type
of monitor is flat-panel display, including liquid-crystal display (LCD), electroluminescent (EL) display

and gas-plasma display. An LCD does not emit light of its own. Rather, it consists of crystal molecules.
[2]An electric field causes the molecules to line up in a way that alters their optical properties.
Unfortunately, many LCDs are difficult to read in sunlight or other strong light. A gas-plasma display is
the best type of flat screen. Like a neon light bulb, the plasma display uses a gas that emits light in the
presence of an electric current.
Printers There are four popular kinds of printers: dot-matrix, laser, ink-jet, and thermal.
Dot-Matrix Printer Dot-matrix printers can produce a page of text in less than 10 seconds
and are highly reliable. They form characters or images using a series of small pins on a print
head. The pins strike an inked ribbon and create an image on paper. Printers are available with
print heads of 9, 18, or 24 pins. One disadvantage of this type of printer is noise.
Laser Printer The laser printer creates dotlike images on a drum, using a laser beam light
source. [3]The characters are treated with a magnetically charged inklike toner and then are
transferred from drum to paper. A heat process is used to make the characters adhere. The laser
printer produces images with excellent letter and graphics quality.
Ink-Jet Printer An ink-jet printer sprays small droplets of ink at high speed onto the
surface of the paper. This process not only produces a letter-quality image but also permits
printing to be done in a variety of colors.
Thermal Printer A thermal printer uses heat elements to produce images on heat-sensitive
paper. Color thermal printers are not as popular because of their cost and the requirement of
specifically treated paper. They are a more special use printer that produces near photographic
output. They are widely used in professional art and design work where very high quality color is
essential.
Plotters Plotters are special-purpose output devices for producing bar charts, maps, architectural
drawings, and even three-dimensional illustrations. Plotters can produce high-quality multicolor
documents and also documents that are larger in size than most printers can handle. There are four types
of plotters: pen, ink-jet, electrostatic, and direct imaging.
Voice-Output Devices Voice-output devices make sounds that resemble human speech but
actually are pre-recorded vocalized sounds. Voice output is used as a reinforcement tool for
learning, such as to help students study a foreign language. It is used in many supermarkets at the
checkout counter to confirm purchases. Of course, one of the most powerful capabilities is to
assist the physically challenged.

Continue reading it-e-03 Computer hardware

skype4java :

skype4java 地址:

http://skype.sourceforge.jp/index.php?Skype%20API%20For%20Java%20%28English%29

使用jni封装skype com+接口

但其

Win32Connector::protected void initializeImpl() 方法有些问题,其意图是如果没又找到skype库就从包里面解压缩skype.dll到临时文件夹下,但是我不知道为什么作者要用zip遍历,而不是使用类加载器获得资源来解压缩。

我做的修改:

try {
            System.loadLibrary("skype");
        } catch(Throwable e) {
            try {
                if(!ConnectorUtils.checkLibraryInPath(LIBFILENAME)) {
                    String dllPath;
                    String tmpDir = System.getProperty("java.io.tmpdir");
                    if(!tmpDir.endsWith("" + File.separatorChar)) {
                        tmpDir = tmpDir + File.separatorChar;
                    }
                    dllPath = tmpDir + LIBFILENAME;
                    File dll = new File(dllPath);
                    if(!dll.exists()) {
extractDll(dll);
                        if(!dll.exists()) {
                            throw new RuntimeException("can't load " + dllPath);
                        }
                    }
                    System.load(dllPath);
                }
            } catch(Exception e1) {
                throw new RuntimeException(e1);
            }
        }

private void extractDll(File destFile) {
        ClassLoader loader = Thread.currentThread().getContextClassLoader();
        InputStream input = loader.getResourceAsStream(LIBFILENAME);
        FileOutputStream output = null;
        try {
            output = new FileOutputStream(destFile);
            byte[] buffer = new byte[1024 * 4];
            long count = 0;
            int n = 0;
            while(-1 != (n = input.read(buffer))) {
                output.write(buffer, 0, n);
                count += n;
            }
        } catch(Exception e) {
            throw new RuntimeException(e);
        } finally {
            try {
                if(null != input) {
                    input.close();
                }
            } catch(IOException e) {
            }
            try {
                if(null != output) {
                    output.close();
                }
            } catch(IOException e) {
            }
        }
    }

需要

winp.jar 检查是否running

swt.jar 借用OS基础功能

Continue reading skype4java :

[转]未来五年可能流行的十大网络技术

10。IP语音

出于成本和便捷因素的考虑,目前很多公司和消费者都已经开始使用VoIP的电话服务。根据SearchVoIP网站在2007年6月份进行的一项调查,纯IP PBX系统在2007年第一季度的销售比上一季度上涨了76%。

越来越多的公司期待加入VoIP阵营,让VoIP设备作为对传统固定电话的补充或者干脆替代后者。因为VoIP运行在TCP/IP网络,所以IT管理员们在很多情况下都会负责VoIP的部署和维护。

9。网络安全技术

精明的IT专家已经在近几年掌握了大量的安全技能,但未来新的安全挑战和新的安全机制会不断涌现。诸如VoIP和移动计算这些应用都带来了新的安全问题和挑战,而且身份认证方式也从单一的基于密码的模式转变为更加多元化的新模式,生物学的应用将会在未来越来越凸显其重要性。

安全威胁也越来越普及,当初只是一些少年黑客为了乐趣而破解网站,现在的网上暴徒却已经开始瞄准公司的商业秘密以及虚拟资产,这些人对网络的攻击已经威胁到了国家整体的虚拟架构,所以我们的安全技术必须随时更新。

8。IPv6

作为下一代Internet协议,IPv6的普及似乎并不像人们此前所预想的那么快速,很大程度上是因为诸如NAT这样的技术的应用能够减缓IP地址的消耗。

然而Internet上的主机数量不断稳定增长,最终我们必然需要扩展更大的地址空间。除了巨大的地址容量之外,IPv6还通过IPsec这一基础的协议组件提供更好的安全性能。目前像Windows Vista、Windows Server 2008、Mac OS X 10.3以及其他最新的操作系统都已经默认支持IPv6,这些操作系统的支持可以看作IPv6已经具备了腾飞的基础。

IPv6的CIDR使用了完全不同的地址符号,采用16进制代码代替了我们所熟悉的4个八位的10进制IPv4代码。对于IT管理员来讲,学习新的只是有些困难,但在转变全面来临之前,我们必须掌握IPv6技术。

7。虚拟化

虚拟化已经出现一段时间了,但目前它才真正开始成熟。微软即将推出的Windows Server 2008服务器操作系统将会包含其大力研发的Windows hypervisor技术(Viridian);而VMWare也提供了免费的VMWare Server,红帽和SuSE也计划在下一版本的服务器产品中包含Xen hypervisor技术。我们可以判定,虚拟机的概念将会在未来几年上升到一个全新的阶段。

管理基于虚拟机的网络环境需要很高的技术修养,但有越来越多的公司开始组建虚拟化技术的服务器,以此来节省硬件支出,这将成为一个必然的趋势。

6。SaaS(软件即服务)

代表了下一代以太网的Web 2.0采用了SaaS(Software as a Service)模式,它通过互联网提供软件服务,而不需要在每个用户的电脑里单独安装应用程序。一些IT专家已经警告称SaaS将会完全取代企业内IT 管理员的工作,但更广为接受的说法则称SaaS将会把IT管理员从繁重的配置和维护工作中解脱出来,而将精力更加集中地投入到对全局的计划以及融合上。

事实上即使SaaS没有出现,IT管理员的职位也会发生变化,会有更多的职位专注于提供应用程序的板块,而不是仅仅关注公司内部的IT部门。在这种情况下,IT员工应该学习一些服务提供和多客户共享架构的知识,这样才能顺应着环境的变化而走在前头。

5。支持移动用户

智能手机、PDA、UMpc以及其他便携式设备已经大量应用,并且在未来会越来越广泛。雇员将会通过手机接收公司的电子邮件,而且有些时候还会通过使用终端服务软件接入公司的局域网。

企业内的IT员工需要多学习支持移动用户的技术,包括设置邮件服务器以及保障设备安全性等。

4。远程用户支持

目前的趋势已经很明显,更多的雇员将会在公司外办公:每周至少会有几天时间,人们在路上使用笔记本电脑,或者在家里使用私人电脑,并且这些员工需要远程接入公司的局域网。所以IT管理员需要对这些远程用户提供支持,并且保证内部网络的安全。

学习与VPN技术相关的不同的技能十分重要,比如SSL VPN技术就十分有用。我们可以利用健康监测以及隔离远程客户端阻止那些未达到安全要求最低准则的电脑接入局域网,以免其对网络产生危害。

3。无线技术

企业级的无线网络目前仍然处于未成熟阶段,很多公司多不太情愿安装无线局域网,安装它只是因为无线局域网能够为雇员以及其他来访者的笔记本电脑接入网络提供最简便的方式,但是仍然有很多组织对其报有戒心,尤其是无线网络的安全性能。

未来一定会出现更快速也更安全的无线技术。你需要了解802.11n,这一新的标准目前还处于开发阶段,但已经确定会在2008年底正式公布。 802.11n能够提供典型的74Mbps吞吐量,最高可达248Mbps,另外它的有效半径也远高于目前的802.11a/b/g标准,可以达到70 米。

2。混合网络

要么全部都是Windows网络,要么全部都是UNIX网络的时代已经过去了,网络在未来会变得更加混合,而不是更加纯粹。当像Ubuntu这些对终端用户十分友好的新版本Linux出现之后,未来我们会看到更多的组织会为其特定用户的桌面电脑配置这些系统。

然而出于应用软件的原因或者个人喜好,其他用户将会继续使用Windows,另外还有很多用户在混合着使用苹果电脑,尤其是在图形处理领域。 仅仅精通一种平台在今后是不会成为IT专家的;你需要周游于不同的操作系统之间,解决各种问题。

1。统一通信
随着VoIP越来越流行,统一通信的概念——不同通信技术的汇合,诸如电子邮件、语音消息、文本短信以及传真等的汇合—— 将会成为下一波技术浪潮的热点。用户今后可以从一个单独的界面上接入所有的通信设备,比如从各种不同的设备上接入电子邮箱,这些设备可以是台式电脑、笔记本电脑、智能手机或者PDA、甚至传统电话。

技术的融合使得网络变得更加复杂,而IT管理员需要提升自身管理融合网络的技能,以面对未来的技术挑战。

 

PS:

802.11b 2.4g频段 最大11m/s 一般6.5m/s。
wi-fi与802.11x不是一个概念,虽然有关联。

Continue reading [转]未来五年可能流行的十大网络技术

it-e-04 information appliance

An "information appliance (IA)" is any device that can process information, signals,
graphics, animation, video and audio; and can exchange such information with another IA device.
Typical devices could be smartphones, smartcard, PDAs, and so on. Digital cameras, ordinary
cellular phones, set-top boxes, and LCD TVs are not information appliances unless they become
capable of communications and information functions. Information appliances may overlap in
definition or are sometimes referred to as smart devices, mobile devices, wireless devices,
internet appliances, web appliances, handhelds, handheld devices or smart handheld devices.
Early Appliances For a short while during the middle and late 1980s there were a few
models of simple electronic typewriters fitted with screens and some form of memory storage.
These devices had some of the attributes of an information appliance. One of these dedicated
word processor machines, the Canon Cat was actually designed by Jef Raskin as the forerunner
of the idea of the information appliance.
Information appliances tend to be consumer devices that perform only a few targeted tasks
and are controlled by a simple touchscreen interface or push buttons on the device's enclosure.
Open Standard Protocols In an ideal world, any true information appliance would be able to communicate with any other information appliance using open standard protocols and
technologies, regardless of the maker of the software or the hardware. The communications
aspects and all user interface elements would be designed together so that a user could switch
seamlessly from one information appliance to another.

Continue reading it-e-04 information appliance

myeclipse maven web项目建立

开始建立一个gernaeral project,

6159600a73ff31736a60fb5a

建立如下结构

2f77826336bd6d898db10dd1

项目上右键,添加maven管理

 551ace3b9b73c8a4b211c7b5
项目上右键添加web属性,并指定webv root文件夹和context

14d27ac14527e14d0ef477bd

注意,上一步maven配置中不要打包类型选择war类型,而应该是jar类型

对应.pom中的<packaging>jar</packaging>,war类型会造成无法更改webroot。

打开工程属性,修改如下项

2bdf7702c8445d501d958389

修改为如下所示:

f8252827c52e3f658744f98b

关于pom中对war plugin的配置此处不叙.

如果误操作的话,会导致webroot文件夹无法改变,即使删除并且重新添加web特性任然如此。

办法是修改目录下.mymetadata文件,修改webrootdir属性即可:

<attributes>
    <attribute name="webrootdir" value="/src/main/webapp" />
  </attributes>

Continue reading myeclipse maven web项目建立

读书笔记:Effective Java, Second Edition-1,10章

enum内部实现继承了Enum,所以它不能继承别的类,另外,enum也不能作为别的类的父类。

一 对象构造销毁

Item 1用静态方法代替构造器

优点:

不像构造器那样,静态工厂方法拥有名字,可以包含某些意义。

不像构造器那样,每次调用都要创建对象,静态方法可返回可不返回。构造器你要不返回那只有抛个异常了。

不像构造器那样,静态工厂方法可以返回类型的某个子类型。

静态工厂方法可以减少冗长的创建参数化类型时的代码【扯淡,现在还不支持类型推断】

缺点:

一个类在仅提供静态工厂方法,同时没有公共或受保护的构造器时,此类是没法被子类化的。不能很好的使用集成的便利

静态工厂方法和其他的静态方法没什么差别,用户很迷惑,到底是用构造器还是工厂?

Item 2当构造一个对象需要很多的参数时,建议使用builder方式

最终达到这样的效果:

NutritionFacts cocaCola = new NutritionFacts.Builder(240, 8).calories(100).sodium(35).carbohydrate(27).build();

链式调用以及最后的构建方法build产生对象。

Item 3强制单例类的构造器为不可见,或者使用enum来做单例。

Item 4像Utils这样的类也将他的构造器设为不可见,你懂的。

Item 5不要创建不必要的对象

注意节约资源,特别注意循环中的对象创建,注意封箱拆箱可能造成的问题

Item 6内存泄露

public Object pop() {
if (size == 0)
throw new EmptyStackException();
return elements[--size]; //这里应该Object result = elements[--size];elements[size] = null;
}

如果elements不消除的话,虽然减少了一个元素,它是他还在内存中

……

Item 7: 避免析构

二 对象通用方法

十 并发

Item 66: 同步访问多线程可修改数据

如果要同步,读写都需要,否则没什么用处

这样的代码因为没有考虑同步,造成

// Broken! - How long would you expect this program to run?
public class StopThread {
private static boolean stopRequested;
public static void main(String[] args)
throws InterruptedException {
Thread backgroundThread = new Thread(new Runnable() {
public void run() {
int i = 0;
while (!stopRequested) //编译器会优化为while (true),因为stopRequested没有同步
i++;
}
});
backgroundThread.start();
TimeUnit.SECONDS.sleep(1);
stopRequested = true;
}
}

Item 67: Avoid excessive synchronization

同步块内尽量少做事情,特别是不要包含外部代码(及你不知道它做什么的代码,也许这个代码正要使用你的锁对象或其他)

为性能以及设计考虑,尽量不要对你的类进行同步设计,而应该非同步设计,文档告知调用方来让调用方维护同步,例如1.5新加的stringbuilder就是基于这种考虑而取代stringbuffer。例外的情况是静态字段的修改,则要保证同步。因为调用方可以想办法保证一个对象同步,却不能保证一个静态字段的同步。

tip:

CopyOnWriteArrayList这个类是在写的时候其实写的是拷贝的,实际的列表项没有改变,当然“很贵”,我应该不会用它。

多核心同步的花费:

In a multicore world, the real cost of excessive synchronization is not the CPU time spent obtaining locks; it is the lost opportunities for parallelism and the delays imposed by the need to ensure that every core has a consistent view of memory

Item 68: Prefer executors and tasks to threads

1.5开始包含的Executor Framework,queue和异步

创建ExecutorService executor = Executors.newSingleThreadExecutor();
执行executor.execute(runnable);
关闭executor.shutdown();

立即关闭shutdownNow();

但是不要被关闭着两个方法迷惑了,shutdownNow它只是intterupt方法而已,

它试图终止线程的方法是通过调用Thread.interrupt()方法来实现的,但是大家知道,这种方法的作用有限,如果线程中没有sleep 、wait、Condition、定时锁等应用, interrupt()方法是无法中断当前的线程的。所以,ShutdownNow()并不代表线程池就一定立即就能退出,它可能必须要等待所有正在执行的任务都执行完成了才能退出。

ScheduledThreadPoolExecutor替代java.util.Timer

详见 http://www.iteye.com/topic/366591

总之原来的thread,timer之流在使用时都要考量一番。

Item 69: Prefer concurrency utilities to wait and notify

java.util.concurrent报提供了三个方面的并发"龙套"模块

Executor Framework,

concurrent collections; 并发性能比普通高

synchronizers:线程间协作,等待等CountDownLatch,Semaphore,CyclicBarrier

从名字可以看出,CountDownLatch是一个倒数计数的锁,

当倒数到0时触发事件,也就是开锁,其他人就可以进入了。
在一些应用场合中,需要等待某个条件达到要求后才能做后面的事情;同时当线程都完成后也会触发事件,以便进行后面的操作。

CountDownLatch最重要的方法是countDown()和await(),前者主要是倒数一次,后者是等待倒数到0,如果没有到达0,就只有阻塞等待了。

下面的例子简单的说明了CountDownLatch的使用方法,模拟了100米赛跑,10名选手已经准备就绪,只等裁判一声令下。当所有人都到达终点时,比赛结束。

package com.eyesmore.concurrent;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class CountDownLatchDemo {
    private static final int PLAY_AMOUNT = 10;
    public static void main(String[] args) {
        /*
         * 比赛开始:只要裁判说开始,那么所有跑步选手就可以开始跑了
         * */
        CountDownLatch begin = new CountDownLatch(1);
        /*
         * 每个队员跑到末尾时,则报告一个到达,所有人员都到达时,则比赛结束
         * */
        CountDownLatch end = new CountDownLatch(PLAY_AMOUNT);
        Player[] plays = new Player[PLAY_AMOUNT];
        for(int i = 0;i<PLAY_AMOUNT;i++) {
            plays[i] = new Player(i+1,begin,end);
        }
        ExecutorService exe = Executors.newFixedThreadPool(PLAY_AMOUNT);
        for(Player p : plays) {//各就各位
            exe.execute(p);
        }
        System.out.println("比赛开始");
        begin.countDown();//宣布开始
        try {
            end.await();//等待结束
        } catch (InterruptedException e) {
            e.printStackTrace();
        } finally {
            System.out.println("比赛结束");
        }
        //注意:此时main线程已经要结束了,但是exe线程如果不关闭是不会结束的
        exe.shutdown();
    }
}
class Player implements Runnable {
    private int id;
    private CountDownLatch begin;
    private CountDownLatch end;
    public Player(int id, CountDownLatch begin, CountDownLatch end) {
        super();
        this.id = id;
        this.begin = begin;
        this.end = end;
    }
    public void run() {
        try {
            begin.await();//必须等到裁判countdown到0的时候才开始
            Thread.sleep((long)(Math.random()*100));//模拟跑步需要的时间
            System.out.println("Play "+id+" has arrived. ");
        } catch (InterruptedException e) {
            e.printStackTrace();
        } finally {
            end.countDown();//向评委报告跑到终点了
        }
    }
}

在实际应用中,有时候需要多个线程同时工作以完成同一件事情,而且在完成过程中,往往会等待其他线程都完成某一阶段后再执行,等所有线程都到达某一个阶段后再统一执行。
比如有几个旅行团需要途经深圳、广州、韶关、长沙最后到达武汉。旅行团中有自驾游的,有徒步的,有乘坐旅游大巴的;这些旅行团同时出发,并且每到一个目的地,都要等待其他旅行团到达此地后再同时出发,直到都到达终点站武汉。

这时候CyclicBarrier就可以派上用场。CyclicBarrier最重要的属性就是参与者个数,另外最要方法是await()。当所有线程都调用了await()后,就表示这些线程都可以继续执行,否则就会等待。

package examples.ch06.example01;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.concurrent.BrokenBarrierException;
import java.util.concurrent.CyclicBarrier;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class TestCyclicBarrier {
    // 徒步需要的时间: Shenzhen, Guangzhou, Shaoguan, Changsha, Wuhan
    private static int[] timeWalk = { 5, 8, 15, 15, 10 };
    // 自驾游
    private static int[] timeSelf = { 1, 3, 4, 4, 5 };
    // 旅游大巴
    private static int[] timeBus = { 2, 4, 6, 6, 7 };
    static String now() {
        SimpleDateFormat sdf = new SimpleDateFormat("HH:mm:ss");
        return sdf.format(new Date()) + ": ";
    }
    static class Tour implements Runnable {
        private int[] times;
        private CyclicBarrier barrier;
        private String tourName;
        public Tour(CyclicBarrier barrier, String tourName, int[] times) {
            this.times = times;
            this.tourName = tourName;
            this.barrier = barrier;
        }
        public void run() {
            try {
                Thread.sleep(times[0] * 1000);
                System.out.println(now() + tourName + " Reached Shenzhen");
                barrier.await();
                Thread.sleep(times[1] * 1000);
                System.out.println(now() + tourName + " Reached Guangzhou");
                barrier.await();
                Thread.sleep(times[2] * 1000);
                System.out.println(now() + tourName + " Reached Shaoguan");
                barrier.await();
                Thread.sleep(times[3] * 1000);
                System.out.println(now() + tourName + " Reached Changsha");
                barrier.await();
                Thread.sleep(times[4] * 1000);
                System.out.println(now() + tourName + " Reached Wuhan");
                barrier.await();
            } catch (InterruptedException e) {
            } catch (BrokenBarrierException e) {
            }
        }
    }
    public static void main(String[] args) {
        // 三个旅行团
        CyclicBarrier barrier = new CyclicBarrier(3);
        ExecutorService exec = Executors.newFixedThreadPool(3);
        exec.submit(new Tour(barrier, "WalkTour", timeWalk));
        exec.submit(new Tour(barrier, "SelfTour", timeSelf));
        exec.submit(new Tour(barrier, "BusTour", timeBus));
        exec.shutdown();
    }
}

Semaphore 信号量,就是一个允许实现设置好的令牌。也许有1个,也许有10个或更多。
谁拿到令牌(acquire)就可以去执行了,如果没有令牌则需要等待。
执行完毕,一定要归还(release)令牌,否则令牌会被很快用光,别的线程就无法获得令牌而执行下去了。

请仔细体会里面关于仓库的处理,

1 是如何保证入库时,如果仓库满就等待,

2 出库时,如果仓库无货就等待的。

3 以及对仓库只有10个库位的处理。

4 对同步问题的处理。

import java.util.concurrent.Semaphore;
/**
* 老紫竹JAVA提高教程-信号量(Semaphore)的使用。<br>
* 生产者和消费者的例子,库存的管理。
*
* @author 老紫竹(java2000.net,laozizhu.com)
*/
public class TestSemaphore {
  public static void main(String[] args) {
    // 启动线程
    for (int i = 0; i <= 3; i++) {
      // 生产者
      new Thread(new Producer()).start();
      // 消费者
      new Thread(new Consumer()).start();
    }
  }
  // 仓库
  static Warehouse buffer = new Warehouse();
  // 生产者,负责增加
  static class Producer implements Runnable {
    static int num = 1;
    @Override
    public void run() {
      int n = num++;
      while (true) {
        try {
          buffer.put(n);
          System.out.println(">" + n);
          // 速度较快。休息10毫秒
          Thread.sleep(10);
        } catch (InterruptedException e) {
          e.printStackTrace();
        }
      }
    }
  }
  // 消费者,负责减少
  static class Consumer implements Runnable {
    @Override
    public void run() {
      while (true) {
        try {
          System.out.println("<" + buffer.take());
          // 速度较慢,休息1000毫秒
          Thread.sleep(1000);
        } catch (InterruptedException e) {
          e.printStackTrace();
        }
      }
    }
  }
  /**
   * 仓库
   *
   * @author 老紫竹(laozizhu.com)
   */
  static class Warehouse {
    // 非满锁
    final Semaphore notFull = new Semaphore(10);
    // 非空锁
    final Semaphore notEmpty = new Semaphore(0);
    // 核心锁
    final Semaphore mutex = new Semaphore(1);
    // 库存容量
    final Object[] items = new Object[10];
    int putptr, takeptr, count;
    /**
     * 把商品放入仓库.<br>
     *
     * @param x
     * @throws InterruptedException
     */
    public void put(Object x) throws InterruptedException {
      // 保证非满
      notFull.acquire();
      // 保证不冲突
      mutex.acquire();
      try {
        // 增加库存
        items[putptr] = x;
        if (++putptr == items.length)
          putptr = 0;
        ++count;
      } finally {
        // 退出核心区
        mutex.release();
        // 增加非空信号量,允许获取商品
        notEmpty.release();
      }
    }
    /**
     * 从仓库获取商品
     *
     * @return
     * @throws InterruptedException
     */
    public Object take() throws InterruptedException {
      // 保证非空
      notEmpty.acquire();
      // 核心区
      mutex.acquire();
      try {
        // 减少库存
        Object x = items[takeptr];
        if (++takeptr == items.length)
          takeptr = 0;
        --count;
        return x;
      } finally {
        // 退出核心区
        mutex.release();
        // 增加非满的信号量,允许加入商品
        notFull.release();
      }
    }
  }
}

几乎没必要使用wait和notify,notifyAll,他们像是同步的汇编语言,而synchronizers则是提供了上层架构的高级语言。

如果确实需要使用,则牢记同步块中的循环中调用例如:

synchronized (obj) { while (<condition does not hold>) obj.wait(); ... // Perform action appropriate to condition }//这是java doc里面的的范例。

为什么要这样写:

Always use the wait loop idiom to invoke the wait method; never invoke it outside of a loop. The loop serves to test the condition before and after waiting. Testing the condition before waiting and skipping the wait if the condition already holds are necessary to ensure liveness. If the condition already holds and the notify (or notifyAll) method has already been invoked before a thread waits, there is no guarantee that the thread will ever wake from the wait. Testing the condition after waiting and waiting again if the condition does not
hold are necessary to ensure safety. If the thread proceeds with the action when the condition does not hold, it can destroy the invariant guarded by the lock. There are several reasons a thread might wake up when the condition does not hold:

• Another thread could have obtained the lock and changed the guarded state between
the time a thread invoked notify and the time the waiting thread woke.
• Another thread could have invoked notify accidentally or maliciously when
the condition did not hold. Classes expose themselves to this sort of mischief
by waiting on publicly accessible objects. Any wait contained in a synchronized
method of a publicly accessible object is susceptible to this problem.
• The notifying thread could be overly “generous” in waking waiting threads.
For example, the notifying thread might invoke notifyAll even if only some
of the waiting threads have their condition satisfied.
• The waiting thread could (rarely) wake up in the absence of a notify. This is
known as a spurious wakeup [Posix, 11.4.3.6.1; JavaSE6].

并且notifyAll要比notify要好。

note:

计时用 System.nanoTime()

Item 70: Document thread safety 为你的同步的方法写好文档、注释

Item 71: Use lazy initialization judiciously 主要讨论了同步情况下的懒加载问题,一般还是建议不需要迟初始化,以免造成并发情况下多次初始化的问题。

Item 72: Don’t depend on the thread scheduler

不要使用Thread.yield,可用sleep代替它。不要使用优先级,这个功能在各虚拟机上表现不一样。yield和优先级都只是暗示,并不意味着虚拟机会执行它们的功能。

Item 73: Avoid thread groups 请用前面提到的线程池而不要使用线程组,线程组你可以忘记他们了,他们是不成功的实现。

Continue reading 读书笔记:Effective Java, Second Edition-1,10章

静态字段,classloader,web容器

http://hi.baidu.com/liangzhongbo1/blog/item/f2c3201ae4b7250c34fa41f6.html

类加载器与 Web 容器

对于运行在 Java EE™ 容器中的 Web 应用来说,类加载器的实现方式与一般的Java 应用有所不同[默认的是从父到子]。不同的 Web 容器的实现方式也会有所不同。以 Apache Tomcat 来说,每个 Web 应用都有一个对应的类加载器实例。该类加载器也使用代理模式,所不同的是它是首先尝试去加载某个类,如果找不到再代理给父类加载器。这与一般类加载器的顺序是相反的。这是 Java Servlet 规范中的推荐做法,其目的是使得 Web 应用自己的类的优先级高于 Web 容器提供的类。这种代理模式的一个例外是:Java 核心库的类是不在查找范围之内的。这也是为了保证 Java 核心库的类型安全。

绝大多数情况下,Web 应用的开发人员不需要考虑与类加载器相关的细节。下面给出几条简单的原则:

每个 Web 应用自己的 Java 类文件和使用的库的 jar 包,分别放在 WEB-INF/classes 和 WEB-INF/lib 目录下面。多个应用共享的 Java 类文件和 jar 包,分别放在 Web 容器指定的由所有 Web 应用共享的目录下面。当出现找不到类的错误时,检查当前类的类加载器和当前线程的上下文类加载器是否正确。

http://agapple.iteyeeye.com/blog/826661

容器

jboss(4.05)

tomcat(6.0.30)

jetty(7.1.20)

支持child/parent first设置(默认值)

Java2ClassLoadingCompliance=false

delegate=false

_parentLoaderPriority=false

过滤package配置

FilteredPacages

默认值: javax.servlet,org.apache.commons.logging

packageTriggers

默认配置:org.apache.commons.logging

systemClasses

默认配置:java.

javax.

org.xml.

org.w3c.

org.apache.commons.logging.

org.eclipse.jetty.continuation.

org.eclipse.jetty.jndi.

org.eclipse.jetty.plus.jaas.

org.eclipse.jetty.websocket.

org.eclipse.jetty.servlet.DefaultServlet.

特殊性

1. UseJBossWebLoader=false时,过滤packages才能生效

2. UseJBossWebLoader=true时,不支持过滤packages

3. jboss 5.0以后UseJBossWebLoader参数将不支持

1. 在执行child/parent判断之前,会委托system classloader装载系统class,比如jdk的lib库

1. 多了一个serverclass配置,如果是serverclass优先采用child first

2. systemclass默认的配置,多了javax,org.xml,org.w3c配置。

相关文档

svn url : http://anonsvn.jboss.org/repos/jbossas/tags/JBoss_4_0_5_GA_CP18

jboss社区classloader文档: http://community.jboss.org/wiki/ClassLoadingConfiguration

svn url : http://svn.apache.org/repos/asf/tomcat/tc6.0.x/trunk

官方classloader机制: http://tomcat.apache.org/tomcat-6.0-doc/class-loader-howto.html

svn url : http://dev.eclipse.org/svnroot/rt/org.eclipse.jetty/jetty/tags/jetty-7.2.0.v20101020/

classloader 官方文档: http://docs.codehaus.org/display/JETTY/Classloading

静态字段

http://stackoverflow.com/questions/797964/what-is-the-exact-meaning-of-static-fields-in-java

Static doesn't quite mean "shared by all instances" - it means "not related to a particular instance at all". In other words, you could get at the static field in class A without ever creating any instances.

As for running two programs within the same JVM - it really depends on exactly what you mean by "running two programs". The static field is effectively associated with the class object, which is in turn associated with a classloader. So if these two programs use separate classloader instances, you'll have two independent static variables. If they both use the same classloader, then there'll only be one so they'll see each other's changes.

As for an alternative - there are various options. One is to pass the reference to the "shared" object to the constructor of each object you create which needs it. It will then need to store that reference for later. This can be a bit of a pain and suck up a bit more memory than a static approach, but it does make for easy testability.

这个不知出处,但根据java规范,静态字段是在类加载器加载类时初始化的,也证明了上面的说法。

一般的java程序,每次启动也是单独的classloader实例,所以即使同一个类的main方法同时执行多次,也不会互相干扰。

总结发言,静态字段虽然是依附于classloader,在web容器下每个app又是自己的classloader,客观来说不会存在互相影响的问题,但任然存在这个风险。所以能不用就不用,特别是是对于单例模式。

 

Continue reading 静态字段,classloader,web容器

Pagination


Total views.

© 2013 - 2019. All rights reserved.

Powered by Hydejack v6.6.1