E

estadiare

Member
Aug 31, 2022
46
  • Like
  • Wow
  • Love
Reactions: akana, lofticries, Lily (Osako) and 2 others
whatevs

whatevs

Mining for copium in the weirdest places.
Jan 15, 2022
2,914
Hey what you're using to get the text in the JSON file, a python script? And how does your terminal/shellscript program look like? Share some code!
 
Last edited:
VirtualSnow

VirtualSnow

who knows
May 21, 2022
110
Hey what you're using to get the text in the JSON file, a python script? And how does your terminal/shellscript program look like? Share some code!
Not OP, but there are a thousand and more ways to do so. The most straightforward one that comes to mind would be writing a spider that registers the content of the site. Get any language of your preference, it's not that difficult to do with something like Python's beautifulsoup4 (the XML/HTML parser), you could program a script that enters lists subforums, gets the post list and starts grabbing their contents. And for the JSON dump, there's an implementation of a JSON parser for almost any programming language out there.
 
  • Informative
  • Like
Reactions: Skathon and whatevs
E

estadiare

Member
Aug 31, 2022
46
@whatevs scraping is very easy since you can just iterate the id's. For example, the url https://sanctioned-suicide.net/threads/lol.1 goes straight to the first post ever made. I made the script using JS, but pretty much every programming language should work

if you're gonna do something like this, I recommend following scraping guidelines. Be gentle with the server, add enough time between the requests, add your e-mail to your user agent so the admins can contact you. I also stripped as much personal information (usernames, direct @-mentions etc, e-mails) as i could

the viewing script is made using c++
 
Last edited:
  • Informative
  • Like
Reactions: Skathon and whatevs
LunarPyotr

LunarPyotr

Похорони меня возле МКАДа
Jul 4, 2020
495
Nice, but I personally prefer to keep stuff a bit more simple. I have a encrypted partition where my browser dumps the cache from domains that I added to the whitelist without putting a load on the site itself.
Perfect solution for personal archives that could be loaded afterwards but this could be dangerous if you don't encrypt all the data that you collected. Not only for yourself but also for other members. That's actually how I was flying through threads on this forum without even having a internet connection.
However, I also have to be careful since my browser dumps it's data whenever I close the browser, except if I execute the kill command as sudo in the terminal. This actually keeps some of the cache and the plugins loaded :P
 
  • Like
Reactions: estadiare

Similar threads

illicit
Replies
8
Views
415
Offtopic
sancta-simplicitas
sancta-simplicitas
demitriusmigsysvotf
Replies
2
Views
203
Suicide Discussion
ultrasharpy123456
U
FuneralCry
Replies
58
Views
3K
Suicide Discussion
Spectre
Spectre
vesisika
Replies
0
Views
485
Suicide Discussion
vesisika
vesisika
tvo
Replies
50
Views
4K
Suicide Discussion
trs
T