Acasă > Unelte SEO in PHP > Analiza directoarelor web

Analiza directoarelor web

Gasim pe internet multe liste de directoare web (romanesti sau straine), ordonate dupa PR. Problema este insa, ca in mare majoritate, aceste liste prezinta valori false la valoarea PR-ului.

Am creat un script PHP, care va analiza un fisier HTML, care contine o lista de linkuri de directoare web. Scriptul va obtine PR-ul fiecarui director (link) din aceasta lista de linkuri, si va ordona lista dupa valoare PR-ului.
Sigur ca in alegerea directoarelor nu numai PR-ul directorului are un rol deosebit. Pentru mai multe detalii, cititi acest articol.

Pentru test, am cautat o pagina de pe net, care a oferit o lista de directoare web. Am coptiat linkurile de pe pagina, intr-un fisier numit “db1.html”, si am salvat in acelasi director cu urmatoarele doua scripturi php.

SCOPUL ACESTUI TEST NU A FOST SA VA PREZINTE DIRECTOARE WEB DE CALITATE, CI SA PUTETI ANALIZA SI ORDONA IN MOD RAPID O LISTA DE DIRECTOARE.
Lista a fost luata de pe pagina www.heliosdesign.ro (Mi sa parut o lista de incredere.)

directory_analizer.php

<?php
set_time_limit(0);
include("pagerank.php");
?>

<?php
//list of html files
$lp = array('db1.html');

$directories = array();
$prs = array();

//scan these files
foreach ($lp as $page) {
	$content = @file_get_contents($page);
	preg_match_all ("/a[\s]+[^>]*?href[\s]?=[\s\"\']+"."(.*?)[\"\']+.*?>"."([^<]+|.*?)?<\/a>/", $content, $matches);
    $matches = $matches[1];
    foreach($matches as $var)
    {
       if (!in_array($var,$directories)) if (strlen($var) < 60) $directories[] = $var;
    }
}

foreach ($directories as $dir) {
	usleep(100);
	$pr = getPageRank($dir);
	$prs[] = $pr;
}

array_multisort($prs, SORT_DESC, SORT_NUMERIC, $directories, SORT_STRING, SORT_ASC);

echo '<html>';
echo '<table>';
for ($i = 0; $i < sizeof($prs); $i++) {
	echo '<tr><td>'.$directories[$i].'</td><td>'.$prs[$i].'</td></tr>';
}
echo '</table>';
echo '</html>';
?>

iar fisierul pagerank.php:

<?php
function StrToNum($Str, $Check, $Magic)
{
    $Int32Unit = 4294967296;
    $length = strlen($Str);
    for ($i = 0; $i < $length; $i++) {
        $Check *= $Magic;
        if ($Check >= $Int32Unit) {
            $Check = ($Check - $Int32Unit * (int) ($Check / $Int32Unit));
            $Check = ($Check < -2147483648) ? ($Check + $Int32Unit) : $Check;
        }
        $Check += ord($Str{$i});
    }
    return $Check;
}

function HashURL($String)
{
    $Check1 = StrToNum($String, 0x1505, 0x21);
    $Check2 = StrToNum($String, 0, 0x1003F);
    $Check1 >>= 2;
    $Check1 = (($Check1 >> 4) & 0x3FFFFC0 ) | ($Check1 & 0x3F);
    $Check1 = (($Check1 >> 4) & 0x3FFC00 ) | ($Check1 & 0x3FF);
    $Check1 = (($Check1 >> 4) & 0x3C000 ) | ($Check1 & 0x3FFF);
    $T1 = (((($Check1 & 0x3C0) << 4) | ($Check1 & 0x3C)) <<2 ) | ($Check2 & 0xF0F );
    $T2 = (((($Check1 & 0xFFFFC000) << 4) | ($Check1 & 0x3C00)) << 0xA) | ($Check2 & 0xF0F0000 );
    return ($T1 | $T2);
}

function CheckHash($Hashnum)
{
    $CheckByte = 0;
    $Flag = 0;
    $HashStr = sprintf('%u', $Hashnum) ;
    $length = strlen($HashStr);
    for ($i = $length - 1;  $i >= 0;  $i --) {
        $Re = $HashStr{$i};
        if (1 === ($Flag % 2)) {
            $Re += $Re;
            $Re = (int)($Re / 10) + ($Re % 10);
        }
        $CheckByte += $Re;
        $Flag ++;
    }
    $CheckByte %= 10;
    if (0 !== $CheckByte) {
        $CheckByte = 10 - $CheckByte;
        if (1 === ($Flag % 2) ) {
            if (1 === ($CheckByte % 2)) {
                $CheckByte += 9;
            }
            $CheckByte >>= 1;
        }
    }
    return '7'.$CheckByte.$HashStr;
}

function getPageRank($url) {
        $agents = array(
                    'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30',
                    'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.9',
                    'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.8',
                );
    $useragent = @ini_get('user_agent');
    @ini_set('user_agent', $agents[array_rand($agents)]);
    $ch = CheckHash(HashURL($url));
    $pr_url ="http://toolbarqueries.google.com/search?client=navclient-auto&ch=$ch&features=Rank&q=info:$url";
    $data = file_get_contents($pr_url);
    @ini_set('user_agent', $useragent);
    $pos = strpos($data, "Rank_");
    if($pos !== false)
    {
        $pr = trim(substr($data, $pos + 9));
        return str_replace("\n",'',$pr);
    }
    else
        return 0;
}
?>

Iar rezultatul a fost urmatoarea:

http://www.federal.ro/ 6
http://www.firmeromania.ro/ 6
http://www.immromania.ro/ 6
http://www.kappa.ro/ 6
http://www.linkuri.ro/ 6
http://www.roportal.ro/ 6
http://www.smarty.ro/ 6
http://webdirectory.rol.ro/ 5
http://www.adresa.ro/ 5
http://www.afla.ro/ 5
http://www.apropo.ro/ 5
http://www.cere.ro/ 5
http://www.e-oferta.ro/ 5
http://www.ghidul.ro/ 5
http://www.index2000.ro/ 5
http://www.legaturi.ro/ 5
http://www.paginialbastre.ro/ 5
http://www.ponturifierbinti.com/ 5
http://www.promovare-site.ro/ 5
http://www.tre.ro/ 5
http://www.adirector.ro/ 4
http://www.bizcity.ro/ 4
http://www.director-web.santamia.ro/ 4
http://www.ghidafaceri.ro/ 4
http://www.idilis.ro/catalog/ 4
http://www.indexb.ro/ 4
http://www.info-romania.ro/ 4
http://www.infofirme.ro/ 4
http://www.prestariservicii.ro/ 4
http://www.publionline.ro/ 4
http://www.repertoar.ro/ 4
http://www.roinfo.biz/ 4
http://www.top1.ro/ 4
http://www.top300.ro/ 4
http://www.trafix.eu/ 4
http://portal.adstart.ro/ 3
http://www.24biz.ro/ 3
http://www.adauga-site.eu/ 3
http://www.adauga.com/ 3
http://www.director-web.net/ 3
http://www.dyr.ro/ 3
http://www.epagini.com/ 3
http://www.euroghid.com 3
http://www.firmeonline.ro/ 3
http://www.haabaa.ro/ 3
http://www.hotstop.ro/ 3
http://www.ldmstudio.com/ 3
http://www.links24.ro/ 3
http://www.memo.ro/ 3
http://www.prodirector.net/ 3
http://www.resurse.com/ 3
http://www.roindex.ro/ 3
http://www.seo-portal.ro/ 3
http://www.top40.ro/ 3
http://www.totaltop.ro/ 3
http://www.webe.ro/ 3
http://www.webindex.ro/ 3
http://www.westinfo.ro/ 3
http://www.wol.ro/ 3
http://www.aix.ro/ 2
http://www.amical.ro/ 2
http://www.atat.ro/ 2
http://www.butic.eu/ 2
http://www.cazare-romania.info 2
http://www.cuvinte.info/ 2
http://www.deconstructii.com/ 2
http://www.directorulweb.com/ 2
http://www.elinks.ro/ 2
http://www.euro-web-directory.com/ 2
http://www.firme-companii.ro/ 2
http://www.informatii24.ro/ 2
http://www.lynk.ro/ 2
http://www.myguide.ro/ 2
http://www.optimizare-site.com/ 2
http://www.portal.ro/ 2
http://www.topdirectorweb.ro/ 2
http://www.univers-web.ro/ 2
http://www.whr.ro/ 2
http://www.enigma.ro/ 1
http://director.domedia.ro/ 0
http://directorweb.itbox.ro/ 0
http://selectii.ro/ 0
http://www.acidlinks.com/ 0
http://www.adauga-url.com/ 0
http://www.add-url.ro/ 0
http://www.astazi.net/ 0
http://www.bumerang.ro/ 0
http://www.director-seo.org/ 0
http://www.directorfirme.ro/ 0
http://www.epweb.ro/director-web/ 0
http://www.evrika.ro/ 0
http://www.index-romania.info/ 0
http://www.murfi.com 0
http://www.rank-up.ro 0
http://www.ro-pix.com/ 0
http://www.rodirector.ro/ 0
http://www.rohit.ro/ 0
http://www.romania-worldwide.info/ 0
http://www.topdirector.net/ 0
http://www.webdb.ro/ 0
http://www.webdirinfo.ro/ 0
http://www.xchange.ro/site/ 0
  1. Niciun comentariu până acum.
  1. No trackbacks yet.

Lasă un răspuns

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Schimbă )

Twitter picture

You are commenting using your Twitter account. Log Out / Schimbă )

Facebook photo

You are commenting using your Facebook account. Log Out / Schimbă )

Connecting to %s

Follow

Get every new post delivered to your Inbox.