Categories
geek microsoft programming tips

invoke-webrequest pro tips

The Invoke-WebRequest PowerShell cmdlet is great if you want to fetch and work with some web page’s output without installing any extra tools like wget.exe for example. If you’re planning to do some text parsing on a web page anyway, PowerShell is an excellent option, so why not go full PS mode?
Unfortunately the command has some drawbacks, causing it to be a lot slower than it should be if you just want plain text and its response parsing can even cause it to lock up and not return a result at all.

So here’re some pro-tips for parsing the output using PowerShell fast and effectively:

1. Use basic parsing

The cmdlet does some DOM parsing by default using Internet Explorer. This takes time and sometimes fails too, so if you want to skip this bit and make things faster, simply add the command-line switch UseBasicParsing:

$r = Invoke-WebRequest https://n3wjack.net -UseBasicParsing

2. Split HTML in lines

Parsing text in PS is easy, but it’s even easier if the result is formatted like a text file with multiple lines instead of the full HTML in a single string. If you get the Content property from your webpage, you can split it up into separate lines by splitting on the newline character:

(Invoke-WebRequest https://n3wjack.net -UseBasicParsing).Content -split "`n"

Or, if you also want the HTTP header info to be included in the result, use RawContent instead:

(Invoke-WebRequest https://n3wjack.net -UseBasicParsing).RawContent -split "`n"

This can be really handy if you want to automatically check if the right response headers are set.
But you can also use the Headers collection on the result object, which is even easier.

3. Disable download progress meter shizzle to download large files (or always to speed things up)

That download progress bar is a nice visual and all when you’re using Invoke-WebRequest to download some large binaries and want to see its progress, but it significantly slows things down too. Set the $progressPreference variable and you’ll see your scripts download those files a lot faster.
The larger the files (like big ass log files, images, video’s etc) the more this matters I’ve noticed.

$progressPreference = 'silentlyContinue'
invoke-webrequest $logurl -outfile .\logfile.log -UseBasicParsing
$progressPreference = 'Continue'

Be sure to reset this setting afterwards, because this affects any cmdlet using that progress-bar feature.

4. No redirects please.

Invoke-WebRequest automatically follows an HTTP redirect (301/302) so you end up with the page you were looking for in most cases.
If you want to test if a URL is properly redirected (or not redirected) this just makes things harder. In that case you can turn off redirects by using the MaximumRedirection parameter and setting it to 0

When you get a URL that returns a 301 when doing this, the command will throw an exception saying the maximum redirection counts has been exceeded. This makes this case easier to test.
The result object will also contain the redirect StatusCode.

$r = Invoke-WebRequest http://n3wjack.net -MaximumRedirection 0

5. Use the PowerShell result object

It’s overkill in some cases, but in others this is pure win. The result object contains some really handy bits of the webpage, making a lot of tricky text and regex parsing obsolete.
It’s a piece of cake to parse all images linked from a page using the Image collection. Want to parse all outgoing links on a page? Use the Links collection. There’s also a StatusCode, a Headers collection a Forms and Inputfield collection for form parsing and more.
Check out what’s available using Get-Members:

Invoke-WebRequest https://n3wjack.net | get-members

4. If all else fails, use wget.exe

Yep. Sometimes Invoke-WebRequest simply doesn’t cut it. I’ve seen it hang on some complex pages trying to parse them and fail miserably.
In that case you can fetch the page using the GNU WGet tool, download the page as a text file and then parse that.
You have to call wget by adding the exe extension part otherwise you’ll be triggering the PowerShell alias for Invoke-WebRequest again.

# Install WGet with Chocolatey
choco install wget

# Get the page and save it as a text file
wget.exe https://n3wjack.net -O nj.html
# Read the file and parse it.
get-content nj.html | % { # parsing code goes here }

That’s all the tips to have to make your web request parsing in PowerShell a breeze. If you know a good tip that isn’t listed here, feel free to drop a comment below!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.