actually handholdability deals not just with pixel density but also with sensor size, which affects angle of coverage... let's take the following eg.:
[table="width: 400, align: left"]
[tr]
[td]Format[/td]
[td]Normal Lens[/td]
[td]Lens Coverage[/td]
[td]1/f[/td]
[td]MP for Equivalent Pixel Density[/td]
[/tr]
[tr]
[td]APS-C/DX[/td]
[td]33mm[/td]
[td]39.51º[/td]
[td]1/33[/td]
[td]15[/td]
[/tr]
[tr]
[td]135mm/FX[/td]
[td]50mm[/td]
[td]39.6º[/td]
[td]1/50[/td]
[td]36[/td]
[/tr]
[tr]
[td]645DigiMF[/td]
[td]70mm[/td]
[td]39.81º[/td]
[td]1/70[/td]
[td]82[/td]
[/tr]
[/table]
The above table shows three sensor formats, the "normal lens" for each format, the angle of coverage with the combination of lens and sensor format, "1/focal length" shutter speed estimate, and the pixel count equivalent to match the D800's sensor pixel density for each format...
the angle of coverage is important in the discussion as it is a factor in how much detail can be seen with a particular setup... it is not so useful to talk, especially when comparing between different formats, about how much absolute detail a sensor can pick up if the angle of coverage, and thus the lens used, is not taken into account...
as we can see, to cover a similar angle on the D800, a DX camera would need a 33mm lens... while the equivalent pixel density is only 15MP, this 15MP is used to cover the equivalent ~40º, and it would only require a 1/f of 1/33 for a stable shot based on the 1/f guideline, compared with a 1/f of 1/50 for the D800... in other words, to cover the same angle, a D800 would need a faster shutter speed to be stable, and each pixel in the D800 covers a smaller angle...
and if we were to extend this to digital medium format, a digital medium format camera of equivalent pixel density would be ~80MP (like Phase One's IQ180, although that has a slightly different angle of coverage compared to the sensor size I used to calculate the above)... and the "normal lens" would be a 70mm, thus requiring a 1/f of 1/70, with the ~80MP covering the same ~40º and therefore each pixel covering an even smaller angle...
camera movement may be measured in terms of the angular displacement of the image angle of coverage... as we can see from above, each pixel of a smaller format is used to cover a larger angle than each pixel of a larger format for an identical total angle of coverage for each sensor format... thus, because the angle of coverage of the sensors is the same for each format in this comparison, for an identical amount of camera shake, each pixel in a smaller format is affected by a smaller relative angular displacement compared to the total angular coverage of each pixel on the sensor, compared with a larger format sensor of the same pixel density... from this, we can see that for the same pixel density, an equivalent amount of camera movement caused by hand shake would have a larger impact on a larger sensor format than for a smaller sensor format covering the same angle... and because a longer lens is used in a larger format to cover the same angle as with a smaller format, camera movement caused by a similar amount of hand shake would be more pronounced on a larger format compared to a smaller format...
thus if we assume that pixel density on its own would affect handholdability, what follows from the above discussion is that this effect on handholdability is more pronounced on larger formats than smaller ones for the same pixel density at the per pixel level... but as others have mentioned, how much blurring due to handling that you may see also depends on how closely we look at the image and how large the image is being displayed... YMMV...