IDGLabs.COM - tips, tools and resource

Knowledge Sharing - Want to participate in the discussion?

, a popular engine, is a tool for finding resources on the . scans web pages to find instances of the keywords you have entered in the box.

The Manuscripts Department at UNC Chapel Hills uses a special version of on its web site. This feature only searches the Department’s web site and not the entire . By using the department’s feature, you can focus your solely on the Manuscripts Department’s holdings.

is web program that ranks web pages in a of hits by giving weight to the links that reference a specific .

is a engine owned by , Inc. whose mission statement is to “organize the world’s information and make it universally accessible and useful”. The largest engine on the web, receives several hundred million queries each day through its various services. The last reported figures were 200 million queries each day as of February 2004, up from 3 million queries per day in September 1999 resp. 10,000 queries per day in November 1998.

Much is being written about Gmail, ’s new webmail system. There’s something deeper to learn about from this product than the initial reaction to the product features, however. Ignore for a moment the observations about leapfrogging their competitors with more user value and a new feature or two. Or diversifying away from into other ; they’ve been doing that for a while. Or the privacy red herring.

No, the story is about seemingly incremental features that are actually massively expensive for others to match, and the platform that is building which makes it cheaper and easier for them to develop and run web-scale than anyone else.

I’ve written before about ’s snippet service, which required that they store the entire web in . All so they could generate a slightly better excerpt than other engines.

has taken the last 10 years of systems research out of university labs, and built their own proprietary, production quality system. What is this platform that is building? It’s a distributed platform that can manage web-scale datasets on 100,000 node clusters. It includes a petabyte, distributed, fault tolerant , distributed RPC , probably network shared and . And a datacenter management system which lets a handful of ops engineers effectively run 100,000 servers. Any of these projects could be the sole focus of a startup

Tags: , , , , , , , , , , , , , , , , , , , , , , ,
  • 0 Comments
  • Filed under: Google
  • Short Tips to maintain Sun Solaris

    Here are some short tips for common tasks on 2.6, 7 and 8

    Important Commands

    $ who -r # Show Run Level
    $ /usr/sbin/prtconf # Print the complete system configuration
    $ /sbin/mountall -l # all local filesystems.
    $ /sbin/init S # Changing to single user mode

    Tags: , , , ,
  • 0 Comments
  • Filed under: Command List
  • Reconfiguration boot in Solaris

     

    When adding or removing in a system, it may be necessary to perform a reconfiguration . During this , the system discovers new and recreates the file /etc/path_to_inst which contains mappings of physical devices to logical instance numbers.

    To perform a reconfiguration from a prompt, create a file called reconfigure in the root . As root, run:

    /reconfigure

    If you are at the Prompt (ok), you can issue the following command to perform a reconfiguration :

    -r

    Warning: Reconfiguration boots can cause problems under certain circumstances, particularly regarding the order of devices. Hosts running Solstice should be particularly cautious of reconfiguratio boots that may change the order of mappings

    Tags: , , , , , , , , , ,
  • 0 Comments
  • Filed under: Solaris
  • Playing with Swingbench

    Swingbench is a load generator (and benchmarks generator) designed by Dominic Giles to stress test an Oracle database. In this post, I will be playing with Swingbench and showing how it can be used. This article will focus on comparing the performance of buffered I/O versus un-buffered I/O (i.e. direct I/O) using the Swingbench tool. Since this article is not about direct I/O (I encourage the interested reader to have a look here for more information on this topic), any results presented here should not be considered conclusive. The results presented are very simple and not complicated at all so should not be taken very seriously. The main point of this article is demonstrate the Swingbench utility; how to set it up and use it.

    Note About the Environment Used for Testing

    Before we delve into using Swingbench, I thought I should mention a little about the environment used for testing as it affects the results a lot! The box used to run the database in this post is a Dell Latitude D810 laptop with a 2.13 GHz processor and 1GB of . It is running on 10, specifically the 11/06 release. The datafiles and redo log files are stored on a Maxtor OneTouch II external hard drive connected via a USB 2.0 interface.

    The datafiles for the database reside on a 80 GB which is formatted with a UFS and the redo logs reside on a 20 GB which is also formatted with a UFS . The database is not running in archive log mode and there is no flash recovery area configured.

    Enabling Direct I/O

    One quick section on how we will be enabling direct I/O for testing purposes. The UFS file system (as does most file systems) supports mounting the file system options which enable processes to bypass the OS cache. One way to enable direct I/O on a UFS file system is to the file system with the forcedirectio option as so:

    # -o forcedirectio //dsk/c2t1d0s1 /u02

    Another method which is possible is setting the FILESYSTEMIO_OPTIONS=SETALL parameter within Oracle (available in 9i and later). As Glenn Fawcett states in this excellent post on direct I/O, the SETALL value passed to the FILESYSTEMIO_OPTIONS parameters sets all the options for a particular file system to enable direct I/O or async I/O. When this parameter is set as stated, Oracle will use an API to enable direct I/O when it opens database files.

    Swingbench Installation and Configuration

    Now that we’ve got the preliminaries out of the way, its time to get on to the main reason for this post. The Swingbench is shipped in a zip file which can be downloaded from here. A prerequisite for running Swingbench is that a Java virtual machine needs to be present on the machine which you will be running Swingbench on.

    After unzipping the Swingbench zip file, you will need to edit the swingbench.env file (if on a UNIX platform) found in the top-level swingbench directory. The following need to be modified according to your environment:

    ORACLE_HOME
    JAVA_HOME
    SWINGHOME
    If using the Oracle instance client instead of a full RDBMS install on the machine you are running Swingbench, the CLASSPATH variable must also be modified from $ORACLE_HOME/jdbc/lib/ojdbc14.jar to $ORACLE_HOME/lib/ojdbc14.jar.

    Installing Calling Circle

    The Calling Circle is an open-source preconfigured benchmark which comes with Swingbench. The Order Entry benchmark also comes with Swingbench but for the purposes of this article, we will only discuss the Calling Circle benchmark.

    The Calling Circle benchmark implements an example OLTP online telecommunications application. The goal of this application is to simulate a randomized workload of customer transactions and measure transaction throughput and response times. Approximately 97 % of the transactions cause at least one database update, with well over three quarters performing two or more updates. More information can be found in the Readme.txt file which comes with the Swingbench .

    The first step for installing Calling Circle is to create the Calling Circle schema (CC) in the database. This is achieved using the ccwizard executable found in the swingbench/bin directory .

    $ ./ccwizard

    Click [Next] on the welcome screen and you will then be presented with the screen shown on the below:

    Choose the option to create the Calling Circle schema. In the next screen, enter the connection details of the database you will be creating the schema in. This will involve entering the host name, port number (if not using the default port of 1521 for your listener) and the database service name. Also, ensure that you choose the type IV Thin JDBC driver. Click [Next] when you have entered this information.

    The next screen involves the schema details for the Calling Circle schema. Enter appropriate locations for the datafiles on your system. When finished entering information on this screen, click [Next] to continue. This will bring you to the Schema Sizing window as shown below:

    Use the slider to select the schema size you wish to use. For this post, I chose to use a schema size with 2,023,019 customers which implies a tablespace of size 2.1GB for data and a tablespace of size 1.3GB for indexes. When finished choosing your schema size, click [Next] to continue. Click [Finish] on the next screen to complete the wizard and create the schema. A progress bar will appear as shown below

    Creating the Input Data for Calling Circle

    Before each run of the Calling Circle application it is necessary to create the input data for the benchmark to run. This is accomplished using the ccwizard program we used previously for creating the Calling Circle schema. Start up the ccwizard program again and click [Next] on the welcome screen. On the “Select Task” screen show previously, this time select to “Generate Data for Benchmark Run” and click [Next].

    In the “Schema Details” window which follows, enter the details of the schema which you created in the last section. Click [Next] once all the necessary information has been entered. You will then be presented with the “Benchmark Details” screen as shown below:

    In this post, we will use 1000 transactions for each test as seen in the “Number of Transactions” dialog window above. Press [Next] to continue and you will be presented with the final screen. Click [Finish] to create the benchmark data.

    Starting the Benchmark Test

    Now that we have the Calling Circle schema created and the input data generated, we can start our tests. To start up Swingbench and ensure that it operates with the Calling Circle benchmark we can pass the sample Calling Circle configuration file (ccconfig.xml) which is supplied with Swingbench as a runtime parameter as so:

    $ ./swingbench -c sample/ccconfig.xml

    This will start up Swingbench with the sample configuration for the Calling Circle application but only a few settings need to be changed for is to use this configuration. All that needs to be changed is the connection settings for the host you have already setup the Calling Circle schema on. Change the connection settings as necessary for your environment.

    The following screen shot show the Calling Circle application running in Swingbench:

    We will be performing 1000 transactions during each test run as specified when we generated the sample data. The Swingbench configuration we will be using for every test we perform is as follows:

    This workload is typical of an OLTP application with 40% reads and 60% writes. The number of users associated with the workload is 15. We will use this exact workload for every test we perform.

    Results & Conclusion

    The measurements from Swingbench which we will use for comparing the performance of a UFS file system when Oracle uses direct I/O versus buffered I/O are the following:

    Transaction throughput (number of transactions per minute)
    Average response time for each transaction type
    We will perform a run of the benchmark 5 times for each configuration we want to compare and then present the average of the measurements below. So we will run the tests 5 times with buffered I/O and then 5 times with un-buffered I/O by setting the FILESYSTEMIO_OPTIONS parameter.

    So the comparisons from these 2 measurements are as follows:

    While these tests were not very conclusive or thorough, they do show how Swingbench can be used for generating database activity. The measurements which I compared are only some of the measurements which Swingbench reports when finished running a benchmark. Hopefully I will be able to play and post a bit more on the excellent Swingbench utility in the future.

    Tags: , , , , , , , , , , , ,
  • 0 Comments
  • Filed under: Database, Oracle db
  • Filesystems is a very interesting area, one of the few areas in Unix where new algorithms still can make a huge difference in performance.

    Often the historical view on filesystems is a bit too Unix-centric and states that the Berkeley Fast File System is the ancestor of most modern file systems. This view ignores competitive and earlier implementations from IBM(HPFS), DEC (VAX VMS), Microsoft () and others.

    Still Unix filesystems became a classic and concepts introduced in ti dominate all modern filesystems It also introduced many interesting features and algorithms into the area. For example a very interesting concept of extended attributes introduced in the 4.4 BSD have recently been added to Ext2fs:

    Immutable files can only be read: nobody can write or delete them. This can be used to protect sensitive configuration files.

    Append-only files can be opened in write mode but data is always appended at the end of the file. Like immutable files, they cannot be deleted or renamed. This is especially useful for log files which can only grow. All-in all following attributes are avialable at ext2f:

    A (no Access time): if a file or directory has this attribute set, whenever it is accessed, either for reading of for writing, its last access time will not be updated. This can be useful, for example, on files or directories which are very often accessed for reading, especially since this parameter is the only one which changes on an inode when it’s open read-only.
    a ( append only): if a file has this attribute set and is open for writing, the only operation possible will be to append data to its previous contents. For a directory, this means that you can only add files to it, but not rename or delete any existing file. Only root can set or clear this attribute.
    d (no dump): dump (8) is the standard UNIX utility for backups. It dumps any for which the dump counter is 1 in /etc/fstab (see chapter “Filesystems and Points”). But if a file or directory has this attribute set, unlike others, it will not be taken into account when a dump is in progress. Note that for directories, this also includes all subdirectories and files under it.
    i ( immutable): a file or directory with this attribute set simply can not be modified at all: it can not be renamed, no further link can be created to it [1] and it cannot be removed. Only root can set or clear this attribute. Note that this also prevents changes to access time, therefore you do not need to set the A attribute when i is set.
    s ( secure deletion): when such a file or directory with this attribute set is deleted, the blocks it was occupying on are written back with zeroes.
    S ( Synchronous mode): when a file or directory has this attribute set, all modifications on it are synchronous and written back to immediately.
    Unix is a classic, but classic has it’s own problems: it’s actually an old and largely outdated that outlived its usefulness. Later ideas implemented in HPFS, BFS and several other more modern filesystems are absent in plain-vanilla implementation of Unix file systems. Balanced trees now serve the base of most modern filesystems including ReiserFs (which started as clone but aqured some unique features in the of development):

    The Reiser Filesystems by Hans Reiser [and Moscow University researchers], a very ambitious project to not only improve performance and add journaling, but to redefine the as a repository for arbitrarily complex objects. reiserfs. Reiserfs is faster than ext2/3 because it uses balanced trees for it’s directory-structures. It was used by Suse and Gentoo.

    Unfortunately the novel feature introduced in HPFS called extended attributes never got traction in other filesystems. Of course the fundamental decision to make attributes indexable deserves closer examination, given the costs of indexing, but still the fixed set of attributes (like in UFS) created too many problems to ignore this issue. Still I think that extended attributes should be present in a , and they can replace such kludges as #! notation in UNIX for specifying default processor in executable files.

    www.scit.wlv.ac.uk/~jphb/spos/notes/ufs.basics.html

    These notes describe the basic Unix file system and the kernel structures that support it. For further information the readers should consult The Design of Unix Operating System by M.J.Bach (Prentice-Hall 1986 ISBN 0-13-201757-1) and The Magic Garden Explained by B.Goodheart and J.Cox (Prentice-Hall 1994 0-13-098138-9). The Bach book is probably easier to read but the Goodheart and Cox book is more up-to-date.

    Modern Unix systems use a Virtual File System (VFS), this allows the system to use many different actual file systems in a seamless fashion. At a low level, driver is required for each actual file system. This allows Network File Systems (NFS), High-Sierra File Systems (HSFS - found on CDROMs), MSDOS File Systems (PCFS) amongst others to be included in the Unix view of an integrated hierarchy of files and directories. Included among the various supported file systems are the Unix File System (UFS) and the older System V File System (S5FS). These constitute the traditional Unix file system and will be described in detail in these notes.

    i-nodes
    directory structures
    Kernel buffer structures
    When the UFS was introduced to BSD in 1982, its use of 32 bit offsets … structure on the for use with systems that don’t understand GPT. …
    www.freebsd.org/projects/bigdisk/index.html - 14k - Cached - Similar pages

    Early versions of Unix used filesystems referred to simply as FS. FS only included the block, superblock, a clump of inodes, and the data blocks. This worked well for the small disks early Unixes were designed for, but as advanced and disks got larger, moving the head back and forth between the clump of inodes and the data blocks they referred to caused thrashing. BSD optimized this in FFS by inventing cylinder groups, breaking the up into smaller chunks, each with its own inode clump and data blocks.

    The intent of BSD FFS is to try to localize associated data blocks and metadata in the same cylinder group, and ideally, all of the contents of a directory (both data and metadata for all the files) in the same or nearby cylinder group, thus reducing fragmentation caused by scattering a directory’s contents over a whole .

    Some of the performance parameters in the superblock included number of tracks and sectors, rotation speed, head speed, and alignment of the sectors between tracks. In a fully optimized system, the head could be moved between close tracks to read scattered sectors from alternating tracks while waiting for the platter to spin around.

    As disks grew larger and larger, sector level optimization became obsolete (especially with disks that used linear sector numbering and variable sectors per track). With larger disks and larger files, fragmented reads became more of a problem. To combat this, BSD originally increased the block size from one sector to 1k in 4.0BSD, and, in FFS, increased the block size from 1k to 8k. This has several effects. The chances of a file’s sectors being contiguous is much greater. The amount of overhead to the file’s blocks is reduced. The number of blocks representable in a fixed bit width block number is increased (allowing for larger disks).

    With larger block sizes, disks with many small files would waste a lot of space, so BSD added block level fragmentation, where the last partial block of data from several files may be stored in a single “fragment” block instead of multiple mostly empty blocks.

    UFS file system is made of:

    block, the first block of every file system (block 0)
    Superblock
    Block 1 that contains:
    Total size of the file system (in blocks)
    Number of blocks reserved for inodes
    Name of the file system
    Device identification
    Date of the last superblock update
    Head of the -block
    of inodes
    Inode blocks and, for assigned inodes:
    File type: regular, device, named pipes, socket, symbolic link
    File owner: UID and GID
    Protection information: rwe for ugo
    Link count: name and inode of master file
    Size of the file in bytes
    Last file access date
    Last file modification date
    Last inode modification date
    Pointers to data blocks: actual location of blocks on physical
    Data blocks with user data or system files

    New blocks allocated to a file are obtained from the -block to which blocks are returned when a file is deleted.
    The superblock is followed by blocks containing inodes and associated inumber pairs. An inode describes an individual file with one inode for each file in the file system. For each file system is allocated a maximum number of inodes and therefore a maximum number of files. The maximum values depend on the the file system size.

    Inode 1 on each file system is unnamed and unused. Inode 2 must correspond to the file system root directory that supports all other files in the file system. Inodes after inode 2 are and can be any file. Inodes and blocks are not allocated in any particular order.

    A directory entry, file or link, consists of the name and the inumber representing the file. The link count indicates the number of directory entries that refer to the same file. A file is deleted if the link count is zero. When the file is deleted the associated inode is returned to the -inode and its associated blocks are returned to the -block .

    Tags: , , , , , , , , , , , , ,
  • 0 Comments
  • Filed under: Solaris
  • Share Your Score

    Advertise