I’m a dedicated — some might even say obsessive — reader of the New York Times. Aside from the quality journalism, they manage to keep their comments section informative and useful, leagues distant from the cesspits of most commenting — even of the fully serious Washington Post.
So reading NYT Comments is sometimes amusing, on occasion truly enlightening, but almost always engaging and informative. So as I’ve been pushing my data integration skills hard these past few weeks, my first approach at trying to drink from the fire hydrant was via SQL Server Management studio. The automation angle wasn’t there – I’d manually update the article whose comments I’d want in my SQL script. And then run the IMPORT JSON script and deserialize it.
But the tutorial I was following had some sloppy coding errors that I coudln’t figure out how to overcome. For some reason I could run the query and get the expected results.
But getting that data into a table? It never worked as the tutorial claimed it would — and admittedly I was using a dataset with a very different structure, so there were plenty of points where I could have made mistakes.
After a few attempts at researching the issue, I stumbled across Azure logic apps. Which, when you get under the hood is remarkable similar to Power Automate Flows in its user interface and structure. And suddenly I was porting data.
Now, by simply adding an NYT article URL — nothing more — to a SharePoint list, I get results in a few seconds, and a write-back including the # of comments and the title of the article.
Get the comments back. IN excel. Because the Logic Apps Interface takes time to navigate around, this is the quick-n-dirty essential columns.
Oddly, extracting the article title proved nettlesome. Eventually I found a workaround (not entirely successful) of using Azure Bing Search, which for politics articles worked well enough at getting me a title so I could write back to the original SharePoint list.
Next steps: add Azure Cognitive Sentiment Analysis, port it to SQL Azure instead of Excel for incremental updates. And to do pagination to retrieve comments beyond the first 25 (a response per page limit imposed by the NYT Developer program)
Still, I am mightily pleased – and wonder what insights I might yet derive from this a dynamic data-set I can really use. A more thorough write-up to follow soon.
You must be logged in to post a comment.