<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Defvayne23.com &#187; Tip</title>
	<atom:link href="http://defvayne23.com/tag/tip/feed/" rel="self" type="application/rss+xml" />
	<link>http://defvayne23.com</link>
	<description>A blog by John Hoover</description>
	<lastBuildDate>Fri, 27 Jan 2012 22:10:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>IE &amp; IFRAME Sessions</title>
		<link>http://defvayne23.com/2010/03/30/ie-iframe-sessions/</link>
		<comments>http://defvayne23.com/2010/03/30/ie-iframe-sessions/#comments</comments>
		<pubDate>Tue, 30 Mar 2010 16:41:39 +0000</pubDate>
		<dc:creator>John Hoover</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[IE]]></category>
		<category><![CDATA[Tip]]></category>

		<guid isPermaLink="false">http://www.defvayne23.com/?p=54</guid>
		<description><![CDATA[Had an issue come up today with a site that offers the service to pull in data to your site using an iframe. Apparently IE has a security feature where it does not accept sessions from external sites using iframe. Sounds great, but the way to make it work is to simply send a header [...]]]></description>
			<content:encoded><![CDATA[<p>Had an issue come up today with a site that offers the service to pull in data to your site using an iframe. Apparently IE has a security feature where it does not accept sessions from external sites using iframe. Sounds great, but the way to make it work is to simply send a header with the request.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">header('P3P: CP=&quot;CAO PSA OUR&quot;');</pre></div></div>

<p>So how does this add security? <a href="http://support.microsoft.com/kb/323752">Find more info here.</a></p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fdefvayne23.com%2F2010%2F03%2F30%2Fie-iframe-sessions%2F&amp;title=IE%20%26%23038%3B%20IFRAME%20Sessions" id="wpa2a_2"><img src="http://defvayne23.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://defvayne23.com/2010/03/30/ie-iframe-sessions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>TCT: Crawl a Website</title>
		<link>http://defvayne23.com/2010/02/18/tct-crawl-a-website/</link>
		<comments>http://defvayne23.com/2010/02/18/tct-crawl-a-website/#comments</comments>
		<pubDate>Thu, 18 Feb 2010 16:51:32 +0000</pubDate>
		<dc:creator>John Hoover</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Thursday Code Tip]]></category>
		<category><![CDATA[Tip]]></category>

		<guid isPermaLink="false">http://www.defvayne23.com/?p=44</guid>
		<description><![CDATA[asdasf]]></description>
			<content:encoded><![CDATA[<p><strong>DISCLAIMER:</strong> I would like to say I do not condone doing this. Better ways, more legal, ways to get content from someone. But sometimes this is asked of you by your boss. DO NOT STEAL CONTENT.</p>
<p>For this weeks Thursday Code Tip I will show how to use PHP to crawl a website to gather content. First we start by selecting the URL to crawl:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
</pre></td><td class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000088;">$sURL</span> <span style="color: #339933;">=</span> <span style="color: #0000ff;">&quot;http://www.defvayne23.com/&quot;</span><span style="color: #339933;">;</span></pre></td></tr></table></div>

<p>Next we get the content of the page:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>2
</pre></td><td class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000088;">$sContent</span> <span style="color: #339933;">=</span> <span style="color: #990000;">file_get_contents</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$sURL</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></td></tr></table></div>

<p>Now to use <a title="REGEX" href="http://us.php.net/manual/en/book.pcre.php">REGEX</a> to get what we want. You can learn patterns <a href="http://geekswithblogs.net/brcraju/articles/235.aspx" title="REGEX">here</a>. Below we search for the text within a H1 tag.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>3
4
</pre></td><td class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000088;">$sPattern</span> <span style="color: #339933;">=</span> <span style="color: #0000ff;">'/&lt;h1&gt;([a-z0-9\s]+)&lt;\/h1&gt;/i'</span><span style="color: #339933;">;</span>
<span style="color: #990000;">preg_match</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$sPattern</span><span style="color: #339933;">,</span> <span style="color: #000088;">$sContent</span><span style="color: #339933;">,</span> <span style="color: #000088;">$aMatches</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></td></tr></table></div>

<p>The above won&#8217;t return anything because I link all my H1&#8242;s. So lets modify it so it will account for the links, but not gather them.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>3
</pre></td><td class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000088;">$sPattern</span> <span style="color: #339933;">=</span> <span style="color: #0000ff;">'/&lt;h1&gt;&lt;a [^&gt;]+&gt;([a-z0-9\s]+)&lt;\/a&gt;&lt;\/h1&gt;/i'</span><span style="color: #339933;">;</span></pre></td></tr></table></div>

<p>Now that we account for the anchor the above should return:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
</pre></td><td class="code"><pre class="php" style="font-family:monospace;"><span style="color: #990000;">Array</span>
<span style="color: #009900;">&#40;</span>
    <span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=&gt;</span> Defvayne23
    <span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=&gt;</span> Defvayne23
<span style="color: #009900;">&#41;</span></pre></td></tr></table></div>

<p>The first part of the array is the HTML it found including the h1 and anchor. The second is just the text that we where looking for.</p>
<p>Here it is all together:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
</pre></td><td class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000088;">$sURL</span> <span style="color: #339933;">=</span> <span style="color: #0000ff;">&quot;http://www.defvayne23.com/&quot;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$sContent</span> <span style="color: #339933;">=</span> <span style="color: #990000;">file_get_contents</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$sURL</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$sPattern</span> <span style="color: #339933;">=</span> <span style="color: #0000ff;">'/&lt;h1&gt;&lt;a [^&gt;]+&gt;([a-z0-9\s]+)&lt;\/a&gt;&lt;\/h1&gt;/i'</span><span style="color: #339933;">;</span>
<span style="color: #990000;">preg_match</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$sPattern</span><span style="color: #339933;">,</span> <span style="color: #000088;">$sContent</span><span style="color: #339933;">,</span> <span style="color: #000088;">$aMatches</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$sHeader</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$aMatches</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span></pre></td></tr></table></div>

]]></content:encoded>
			<wfw:commentRss>http://defvayne23.com/2010/02/18/tct-crawl-a-website/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
<!-- WP Super Cache is installed but broken. The path to wp-cache-phase1.php in wp-content/advanced-cache.php must be fixed! -->
