<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments for chrisspenblog</title>
	<atom:link href="http://www.chrisspen.com/blog/comments/feed" rel="self" type="application/rss+xml" />
	<link>http://www.chrisspen.com/blog</link>
	<description>Just another WordPress weblog</description>
	<pubDate>Wed, 10 Mar 2010 16:59:05 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
		<item>
		<title>Comment on Handling PostgreSQL Integrity Errors in Django by Marcob</title>
		<link>http://www.chrisspen.com/blog/handling-postgresql-integrity-errors-in-django.html#comment-25</link>
		<dc:creator>Marcob</dc:creator>
		<pubDate>Sun, 10 Jan 2010 17:54:14 +0000</pubDate>
		<guid isPermaLink="false">http://www.chrisspen.com/blog/?p=14#comment-25</guid>
		<description>This would have helped and saved me a lot of time  :-)

I discovered this solution independently and I tried with every combination of transaction, connection, ecc: this was the only workaround that worked.</description>
		<content:encoded><![CDATA[<p>This would have helped and saved me a lot of time  <img src='http://www.chrisspen.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>I discovered this solution independently and I tried with every combination of transaction, connection, ecc: this was the only workaround that worked.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Enabling Implicit Cast From Integer To Boolean in PostgreSQL by Chris</title>
		<link>http://www.chrisspen.com/blog/enabling-implicit-cast-from-integer-to-boolean-in-postgresql.html#comment-22</link>
		<dc:creator>Chris</dc:creator>
		<pubDate>Wed, 02 Dec 2009 05:47:03 +0000</pubDate>
		<guid isPermaLink="false">http://www.chrisspen.com/blog/?p=13#comment-22</guid>
		<description>I'm glad you found it useful.

According to http://doxygen.postgresql.org/pg__cast_8h-source.html the three allowable values for pg_cast.castcontext are "e", "i", and "a", where "a" specifies "coercion in context of assignment".</description>
		<content:encoded><![CDATA[<p>I&#8217;m glad you found it useful.</p>
<p>According to <a href="http://doxygen.postgresql.org/pg__cast_8h-source.html" rel="nofollow">http://doxygen.postgresql.org/pg__cast_8h-source.html</a> the three allowable values for pg_cast.castcontext are &#8220;e&#8221;, &#8220;i&#8221;, and &#8220;a&#8221;, where &#8220;a&#8221; specifies &#8220;coercion in context of assignment&#8221;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Enabling Implicit Cast From Integer To Boolean in PostgreSQL by GH</title>
		<link>http://www.chrisspen.com/blog/enabling-implicit-cast-from-integer-to-boolean-in-postgresql.html#comment-21</link>
		<dc:creator>GH</dc:creator>
		<pubDate>Wed, 02 Dec 2009 05:29:56 +0000</pubDate>
		<guid isPermaLink="false">http://www.chrisspen.com/blog/?p=13#comment-21</guid>
		<description>Thanks for this post, I found it by googling an error message I got while trying to do exactly this type of implicit cast and your code worked like a charm. 

Besides "e" and "i" are there other options for "castcontext"?</description>
		<content:encoded><![CDATA[<p>Thanks for this post, I found it by googling an error message I got while trying to do exactly this type of implicit cast and your code worked like a charm. </p>
<p>Besides &#8220;e&#8221; and &#8220;i&#8221; are there other options for &#8220;castcontext&#8221;?</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on How to Extract a Webpage&#8217;s Main Article Content by Gerard Gleeson</title>
		<link>http://www.chrisspen.com/blog/how-to-extract-a-webpages-main-article-content.html#comment-20</link>
		<dc:creator>Gerard Gleeson</dc:creator>
		<pubDate>Tue, 01 Dec 2009 11:26:49 +0000</pubDate>
		<guid isPermaLink="false">http://www.chrisspen.com/blog/?p=6#comment-20</guid>
		<description>I implemeted a very similar idea in java a while ago, while not even nearly as this it might help people
http://www.redbrick.dcu.ie/~gleesog4/Projects/page.html#extractmainarticle</description>
		<content:encoded><![CDATA[<p>I implemeted a very similar idea in java a while ago, while not even nearly as this it might help people<br />
<a href="http://www.redbrick.dcu.ie/~gleesog4/Projects/page.html#extractmainarticle" rel="nofollow">http://www.redbrick.dcu.ie/~gleesog4/Projects/page.html#extractmainarticle</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Handling PostgreSQL Integrity Errors in Django by Stephen</title>
		<link>http://www.chrisspen.com/blog/handling-postgresql-integrity-errors-in-django.html#comment-17</link>
		<dc:creator>Stephen</dc:creator>
		<pubDate>Tue, 06 Oct 2009 22:01:35 +0000</pubDate>
		<guid isPermaLink="false">http://www.chrisspen.com/blog/?p=14#comment-17</guid>
		<description>Thanks, this helped me!</description>
		<content:encoded><![CDATA[<p>Thanks, this helped me!</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on How to Extract a Webpage&#8217;s Main Article Content by a</title>
		<link>http://www.chrisspen.com/blog/how-to-extract-a-webpages-main-article-content.html#comment-14</link>
		<dc:creator>a</dc:creator>
		<pubDate>Fri, 02 Oct 2009 21:29:26 +0000</pubDate>
		<guid isPermaLink="false">http://www.chrisspen.com/blog/?p=6#comment-14</guid>
		<description>seems to only pull out the comments for some pages. example:
http://www.cbsnews.com/blogs/2009/10/02/politics/politicalhotsheet/entry5359359.shtml</description>
		<content:encoded><![CDATA[<p>seems to only pull out the comments for some pages. example:<br />
<a href="http://www.cbsnews.com/blogs/2009/10/02/politics/politicalhotsheet/entry5359359.shtml" rel="nofollow">http://www.cbsnews.com/blogs/2009/10/02/politics/politicalhotsheet/entry5359359.shtml</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on How to Extract a Webpage&#8217;s Main Article Content by Dave</title>
		<link>http://www.chrisspen.com/blog/how-to-extract-a-webpages-main-article-content.html#comment-10</link>
		<dc:creator>Dave</dc:creator>
		<pubDate>Fri, 22 May 2009 16:43:47 +0000</pubDate>
		<guid isPermaLink="false">http://www.chrisspen.com/blog/?p=6#comment-10</guid>
		<description>Yes it seems strange you would go to all this effort, but not publish the code. 

I too would appreciate looking at the code</description>
		<content:encoded><![CDATA[<p>Yes it seems strange you would go to all this effort, but not publish the code. </p>
<p>I too would appreciate looking at the code</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on How to Extract a Webpage&#8217;s Main Article Content by Baishampayan Ghose</title>
		<link>http://www.chrisspen.com/blog/how-to-extract-a-webpages-main-article-content.html#comment-9</link>
		<dc:creator>Baishampayan Ghose</dc:creator>
		<pubDate>Tue, 24 Mar 2009 13:52:48 +0000</pubDate>
		<guid isPermaLink="false">http://www.chrisspen.com/blog/?p=6#comment-9</guid>
		<description>Nice idea. Any chance of opening up the code so that people can learn? There are similar things for Perl, etc. but nothing for Python. So some code would be appreciated. Thanks.</description>
		<content:encoded><![CDATA[<p>Nice idea. Any chance of opening up the code so that people can learn? There are similar things for Perl, etc. but nothing for Python. So some code would be appreciated. Thanks.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on How to Extract a Webpage&#8217;s Main Article Content by High-quality personal filtering &#171; Sri Spot</title>
		<link>http://www.chrisspen.com/blog/how-to-extract-a-webpages-main-article-content.html#comment-3</link>
		<dc:creator>High-quality personal filtering &#171; Sri Spot</dc:creator>
		<pubDate>Wed, 17 Sep 2008 14:24:24 +0000</pubDate>
		<guid isPermaLink="false">http://www.chrisspen.com/blog/?p=6#comment-3</guid>
		<description>[...] How to Extract a Webpage&#8217;s Main Article Content I had an idea to make a personalized news feed reader. Basically, I’d register a bunch of feeds with the application, and rate a few stories as either “good” or “bad”. The application would then use my ratings and the article text to generate a statistical model, apply that model to future articles, and only recommend those it predicted I would rate as “good”. It sounded like a plausible idea. I decided to start a pet project. [...]</description>
		<content:encoded><![CDATA[<p>[...] How to Extract a Webpage&#8217;s Main Article Content I had an idea to make a personalized news feed reader. Basically, I’d register a bunch of feeds with the application, and rate a few stories as either “good” or “bad”. The application would then use my ratings and the article text to generate a statistical model, apply that model to future articles, and only recommend those it predicted I would rate as “good”. It sounded like a plausible idea. I decided to start a pet project. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on How to Extract a Webpage&#8217;s Main Article Content by Ted Dziuba</title>
		<link>http://www.chrisspen.com/blog/how-to-extract-a-webpages-main-article-content.html#comment-2</link>
		<dc:creator>Ted Dziuba</dc:creator>
		<pubDate>Tue, 16 Sep 2008 15:02:18 +0000</pubDate>
		<guid isPermaLink="false">http://www.chrisspen.com/blog/?p=6#comment-2</guid>
		<description>Hi Chris,

Nice article.  Glad to see that somebody else cares about this problem too.  Your approach is very clean.  We implemented something that makes several passes over the DOM and snips out nodes driven by a few simple heuristics like "text weight" and such.  We found that stuff like that works well for maybe 70% of the web pages out there.  To reach the other ~30%, we took a data-driven approach (i.e. using the data we already crawled to figure out what markup is).

Still, we can't get everything right, but it works Well Enough (tm).

Also, the semantic search company Twine has attempted something like this.

Cheers,

Ted
(of Persai, now Pressflip)</description>
		<content:encoded><![CDATA[<p>Hi Chris,</p>
<p>Nice article.  Glad to see that somebody else cares about this problem too.  Your approach is very clean.  We implemented something that makes several passes over the DOM and snips out nodes driven by a few simple heuristics like &#8220;text weight&#8221; and such.  We found that stuff like that works well for maybe 70% of the web pages out there.  To reach the other ~30%, we took a data-driven approach (i.e. using the data we already crawled to figure out what markup is).</p>
<p>Still, we can&#8217;t get everything right, but it works Well Enough &#8482;.</p>
<p>Also, the semantic search company Twine has attempted something like this.</p>
<p>Cheers,</p>
<p>Ted<br />
(of Persai, now Pressflip)</p>
]]></content:encoded>
	</item>
</channel>
</rss>
